qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] multifd: Make the main thread yield periodically to the main loop
@ 2025-08-07  2:41 yong.huang
  2025-08-07  9:32 ` Lukas Straub
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: yong.huang @ 2025-08-07  2:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, yong.huang

From: Hyman Huang <yong.huang@smartx.com>

When there are network issues like missing TCP ACKs on the send
side during the multifd live migration. At the send side, the error
"Connection timed out" is thrown out and source QEMU process stop
sending data, at the receive side, The IO-channels may be blocked
at recvmsg() and thus the main loop gets stuck and fails to respond
to QMP commands consequently.

The QEMU backtrace at the receive side with the main thread and two
multi-channel threads is displayed as follows:

multifd thread 2:
Thread 10 (Thread 0x7fd24d5fd700 (LWP 1413634)):
0  0x00007fd46066d157 in __libc_recvmsg (fd=46, msg=msg@entry=0x7fd24d5fc530, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recvmsg.c:28
1  0x00005556d52ffb1b in qio_channel_socket_readv (ioc=<optimized out>, iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, flags=<optimized out>, errp=0x7fd24d5fc6f8) at ../io/channel-socket.c:513
2  0x00005556d530561f in qio_channel_readv_full_all_eof (ioc=0x5556d76db290, iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=errp@entry=0x7fd24d5fc6f8) at ../io/channel.c:142
3  0x00005556d53057d9 in qio_channel_readv_full_all (ioc=<optimized out>, iov=<optimized out>, niov=<optimized out>, fds=<optimized out>, nfds=<optimized out>, errp=0x7fd24d5fc6f8) at ../io/channel.c:210
4  0x00005556d4fa4fc9 in multifd_recv_thread (opaque=opaque@entry=0x5556d7affa60) at ../migration/multifd.c:1113
5  0x00005556d5414826 in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
6  0x00007fd460662f1b in start_thread (arg=0x7fd24d5fd700) at pthread_create.c:486
7  0x00007fd46059a1a0 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:98

multifd thread 1:
Thread 9 (Thread 0x7fd24ddfe700 (LWP 1413633)):
0  0x00007fd46066d157 in __libc_recvmsg (fd=44, msg=msg@entry=0x7fd24ddfd530, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recvmsg.c:28
1  0x00005556d52ffb1b in qio_channel_socket_readv (ioc=<optimized out>, iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, flags=<optimized out>, errp=0x7fd24ddfd6f8) at ../io/channel-socket.c:513
2  0x00005556d530561f in qio_channel_readv_full_all_eof (ioc=0x5556d76dc600, iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=errp@entry=0x7fd24ddfd6f8) at ../io/channel.c:142
3  0x00005556d53057d9 in qio_channel_readv_full_all (ioc=<optimized out>, iov=<optimized out>, niov=<optimized out>, fds=<optimized out>, nfds=<optimized out>, errp=0x7fd24ddfd6f8) at ../io/channel.c:210
4  0x00005556d4fa4fc9 in multifd_recv_thread (opaque=opaque@entry=0x5556d7aff990) at ../migration/multifd.c:1113
5  0x00005556d5414826 in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
6  0x00007fd460662f1b in start_thread (arg=0x7fd24ddfe700) at pthread_create.c:486
7  0x00007fd46059a1a0 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:98

main thread:
Thread 1 (Thread 0x7fd45f1fbe40 (LWP 1413088)):
0  0x00007fd46066b616 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x5556d7604e80) at ../sysdeps/unix/sysv/linux/futex-internal.h:216
1  do_futex_wait (sem=sem@entry=0x5556d7604e80, abstime=0x0) at sem_waitcommon.c:111
2  0x00007fd46066b708 in __new_sem_wait_slow (sem=sem@entry=0x5556d7604e80, abstime=0x0) at sem_waitcommon.c:183
3  0x00007fd46066b779 in __new_sem_wait (sem=sem@entry=0x5556d7604e80) at sem_wait.c:42
4  0x00005556d5415524 in qemu_sem_wait (sem=0x5556d7604e80) at ../util/qemu-thread-posix.c:358
5  0x00005556d4fa5e99 in multifd_recv_sync_main () at ../migration/multifd.c:1052
6  0x00005556d521ed65 in ram_load_precopy (f=f@entry=0x5556d75dfb90) at ../migration/ram.c:4446
7  0x00005556d521f1dd in ram_load (f=0x5556d75dfb90, opaque=<optimized out>, version_id=4) at ../migration/ram.c:4495
8  0x00005556d4faa3e7 in vmstate_load (f=f@entry=0x5556d75dfb90, se=se@entry=0x5556d6083070) at ../migration/savevm.c:909
9  0x00005556d4fae7a0 in qemu_loadvm_section_part_end (mis=0x5556d6082cc0, f=0x5556d75dfb90) at ../migration/savevm.c:2475
10 qemu_loadvm_state_main (f=f@entry=0x5556d75dfb90, mis=mis@entry=0x5556d6082cc0) at ../migration/savevm.c:2634
11 0x00005556d4fafbd5 in qemu_loadvm_state (f=0x5556d75dfb90) at ../migration/savevm.c:2706
12 0x00005556d4f9ebdb in process_incoming_migration_co (opaque=<optimized out>) at ../migration/migration.c:561
13 0x00005556d542513b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:186
14 0x00007fd4604ef970 in ?? () from target:/lib64/libc.so.6

Once the QEMU process falls into the above state in the presence of
the network errors, live migration cannot be canceled gracefully,
leaving the destination VM in the "paused" state, since the QEMU
process on the destination side doesn't respond to the QMP command
"migrate_cancel".

To fix that, make the main thread yield to the main loop after waiting
too long for the multi-channels to finish receiving data during one
iteration. 10 seconds is a sufficient timeout period to set.

Signed-off-by: Hyman Huang <yong.huang@smartx.com>
---
 migration/multifd.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index b255778855..aca0aeb341 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1228,6 +1228,16 @@ void multifd_recv_sync_main(void)
             }
         }
         trace_multifd_recv_sync_main_signal(p->id);
+        do {
+            if (qemu_sem_timedwait(&multifd_recv_state->sem_sync, 10000) == 0) {
+                break;
+            }
+            if (qemu_in_coroutine()) {
+                aio_co_schedule(qemu_get_current_aio_context(),
+                                qemu_coroutine_self());
+                qemu_coroutine_yield();
+            }
+        } while (1);
         qemu_sem_post(&p->sem_sync);
     }
     trace_multifd_recv_sync_main(multifd_recv_state->packet_num);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-07  2:41 [PATCH] multifd: Make the main thread yield periodically to the main loop yong.huang
@ 2025-08-07  9:32 ` Lukas Straub
  2025-08-07  9:36 ` Lukas Straub
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Lukas Straub @ 2025-08-07  9:32 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 762 bytes --]

On Thu,  7 Aug 2025 10:41:17 +0800
yong.huang@smartx.com wrote:

> From: Hyman Huang <yong.huang@smartx.com>
> 
> When there are network issues like missing TCP ACKs on the send
> side during the multifd live migration. At the send side, the error
> "Connection timed out" is thrown out and source QEMU process stop
> sending data, at the receive side, The IO-channels may be blocked
> at recvmsg() and thus the main loop gets stuck and fails to respond
> to QMP commands consequently.
> ...

Hi Hyman Huang,

Have you tried the 'yank' command to shutdown the sockets? It exactly
meant to recover from hangs and should solve your issue.

https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#yank-feature

Best regards,
Lukas Straub

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-07  2:41 [PATCH] multifd: Make the main thread yield periodically to the main loop yong.huang
  2025-08-07  9:32 ` Lukas Straub
@ 2025-08-07  9:36 ` Lukas Straub
  2025-08-08  2:36   ` Yong Huang
  2025-08-08  6:36 ` Yong Huang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Lukas Straub @ 2025-08-07  9:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, yong.huang

[-- Attachment #1: Type: text/plain, Size: 762 bytes --]

On Thu,  7 Aug 2025 10:41:17 +0800
yong.huang@smartx.com wrote:

> From: Hyman Huang <yong.huang@smartx.com>
> 
> When there are network issues like missing TCP ACKs on the send
> side during the multifd live migration. At the send side, the error
> "Connection timed out" is thrown out and source QEMU process stop
> sending data, at the receive side, The IO-channels may be blocked
> at recvmsg() and thus the main loop gets stuck and fails to respond
> to QMP commands consequently.
> ...

Hi Hyman Huang,

Have you tried the 'yank' command to shutdown the sockets? It exactly
meant to recover from hangs and should solve your issue.

https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#yank-feature

Best regards,
Lukas Straub

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-07  9:36 ` Lukas Straub
@ 2025-08-08  2:36   ` Yong Huang
  2025-08-08  7:01     ` Lukas Straub
  0 siblings, 1 reply; 20+ messages in thread
From: Yong Huang @ 2025-08-08  2:36 UTC (permalink / raw)
  To: Lukas Straub; +Cc: qemu-devel, Peter Xu, Fabiano Rosas

[-- Attachment #1: Type: text/plain, Size: 15840 bytes --]

On Thu, Aug 7, 2025 at 5:36 PM Lukas Straub <lukasstraub2@web.de> wrote:

> On Thu,  7 Aug 2025 10:41:17 +0800
> yong.huang@smartx.com wrote:
>
> > From: Hyman Huang <yong.huang@smartx.com>
> >
> > When there are network issues like missing TCP ACKs on the send
> > side during the multifd live migration. At the send side, the error
> > "Connection timed out" is thrown out and source QEMU process stop
> > sending data, at the receive side, The IO-channels may be blocked
> > at recvmsg() and thus the main loop gets stuck and fails to respond
> > to QMP commands consequently.
> > ...
>
> Hi Hyman Huang,
>
> Have you tried the 'yank' command to shutdown the sockets? It exactly
> meant to recover from hangs and should solve your issue.
>
> https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#yank-feature


Thanks for the comment and advice.

Let me give more details about the migration state when the issue happens:

On the source side, libvirt has already aborted the migration job:

$ virsh domjobinfo fdecd242-f278-4308-8c3b-46e144e55f63
Job type:         Failed
Operation:        Outgoing migration

QMP query-yank shows that there is no migration yank instance:

$ virsh qemu-monitor-command fdecd242-f278-4308-8c3b-46e144e55f63
'{"execute":"query-yank"}' --pretty
{
  "return": [
    {
      "type": "chardev",
      "id": "charmonitor"
    },
    {
      "type": "chardev",
      "id": "charchannel0"
    },
    {
      "type": "chardev",
      "id": "libvirt-2-virtio-format"
    }
  ],
  "id": "libvirt-5217"
}

The libvirt migration job is stuck as the following backtrace shows; it
shows that migration is waiting for the "Finish" RPC on the destination
side to return.

#0  0x00007f4c93d086c9 in __GI___poll (fds=0x7f4c50000d20, nfds=2,
timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f4c93e99379 in ?? () from /lib64/libglib-2.0.so.0
#2  0x00007f4c93e996c2 in g_main_loop_run () from /lib64/libglib-2.0.so.0
#3  0x00007f4c94aac92a in virNetClientIOEventLoop
(client=client@entry=0x7f4c501a3ef0,
thiscall=thiscall@entry=0x7f4c50052a90) at
../../src/rpc/virnetclient.c:1684
#4  0x00007f4c94aacf59 in virNetClientIO (thiscall=0x7f4c50052a90,
client=0x7f4c501a3ef0) at ../../src/rpc/virnetclient.c:1952
#5  virNetClientSendInternal (client=client@entry=0x7f4c501a3ef0,
msg=msg@entry=0x7f4c501a2150, expectReply=expectReply@entry=true,
nonBlock=nonBlock@entry=false) at ../../src/rpc/virnetclient.c:2123
#6  0x00007f4c94aae793 in virNetClientSendWithReply
(client=client@entry=0x7f4c501a3ef0, msg=msg@entry=0x7f4c501a2150) at
../../src/rpc/virnetclient.c:2151
#7  0x00007f4c94aa9460 in virNetClientProgramCall
(prog=prog@entry=0x7f4c50066870, client=client@entry=0x7f4c501a3ef0,
serial=serial@entry=10, proc=proc@entry=306, noutfds=noutfds@entry=0,
    outfds=outfds@entry=0x0, ninfds=0x0, infds=0x0,
args_filter=0x7f4c94af1290
<xdr_remote_domain_migrate_finish3_params_args>, args=0x7f4c8487e300,
    ret_filter=0x7f4c94af1310
<xdr_remote_domain_migrate_finish3_params_ret>, ret=0x7f4c8487e350) at
../../src/rpc/virnetclientprogram.c:324
#8  0x00007f4c94acb2e4 in callFull (priv=priv@entry=0x7f4c5004c800,
flags=flags@entry=0, fdin=fdin@entry=0x0, fdinlen=fdinlen@entry=0,
fdout=fdout@entry=0x0, fdoutlen=fdoutlen@entry=0x0, proc_nr=306,
    args_filter=0x7f4c94af1290
<xdr_remote_domain_migrate_finish3_params_args>, args=0x7f4c8487e300
"\004", ret_filter=0x7f4c94af1310
<xdr_remote_domain_migrate_finish3_params_ret>,
    ret=0x7f4c8487e350 "", conn=0x7f4c5007c900) at
../../src/remote/remote_driver.c:6754
#9  0x00007f4c94ae20f8 in call (conn=0x7f4c5007c900,
ret=0x7f4c8487e350 "", ret_filter=<optimized out>, args=0x7f4c8487e300
"\004", args_filter=<optimized out>, proc_nr=306, flags=0,
    priv=<optimized out>) at ../../src/remote/remote_driver.c:6776
#10 remoteDomainMigrateFinish3Params (dconn=0x7f4c5007c900,
params=<optimized out>, nparams=4, cookiein=0x0, cookieinlen=0,
cookieout=0x7f4c8487e4e0, cookieoutlen=0x7f4c8487e4b4, flags=131611,
    cancelled=1) at ../../src/remote/remote_driver.c:7362 // 调用目的端
Finish API 的 RPC,阻塞等待其执行结果
#11 0x00007f4c74d44600 in qemuMigrationSrcPerformPeer2Peer3
(flags=<optimized out>, useParams=<optimized out>, bandwidth=0,
migParams=0x7f4c5002b540, nbdPort=0, migrate_disks=<optimized out>,
    nmigrate_disks=0, listenAddress=<optimized out>, graphicsuri=0x0,
uri=<optimized out>, dname=0x0,
    persist_xml=0x7f4c5006f720 "<?xml version=\"1.0\"
encoding=\"utf-8\"?>\n<domain type=\"kvm\"
xmlns:qemu=\"http://libvirt.org/schemas/domain/qemu/1.0\"><name>5cdd670f-ac55-4820-a66e-b6e3985e1520</name><uuid>dd053ff0-5e12-44f5-9b97-1826715"...,
xmlin=<optimized out>, vm=0x7f4c38257de0, dconnuri=0x7f4c5000f840
"qemu+tls://172.16.170.52/system?no_verify=1", dconn=0x7f4c5007c900,
sconn=0x7f4c0000fb70,
    driver=0x7f4c3814e4b0) at ../../src/qemu/qemu_migration.c:4512
#12 qemuMigrationSrcPerformPeer2Peer (v3proto=<synthetic pointer>,
resource=0, dname=0x0, flags=<optimized out>,
migParams=0x7f4c5002b540, nbdPort=0, migrate_disks=<optimized out>,
nmigrate_disks=0,
    listenAddress=<optimized out>, graphicsuri=0x0, uri=<optimized
out>, dconnuri=0x7f4c5000f840
"qemu+tls://172.16.170.52/system?no_verify=1",
    persist_xml=0x7f4c5006f720 "<?xml version=\"1.0\"
encoding=\"utf-8\"?>\n<domain type=\"kvm\"
xmlns:qemu=\"http://libvirt.org/schemas/domain/qemu/1.0\"><name>5cdd670f-ac55-4820-a66e-b6e3985e1520</name><uuid>dd053ff0-5e12-44f5-9b97-1826715"...,
xmlin=<optimized out>, vm=0x7f4c38257de0, sconn=0x7f4c0000fb70,
driver=0x7f4c3814e4b0) at ../../src/qemu/qemu_migration.c:4767
#13 qemuMigrationSrcPerformJob (driver=driver@entry=0x7f4c3814e4b0,
conn=conn@entry=0x7f4c0000fb70, vm=vm@entry=0x7f4c38257de0,
    xmlin=xmlin@entry=0x7f4c50026f80 "<?xml version=\"1.0\"
encoding=\"utf-8\"?>\n<domain type=\"kvm\"
xmlns:qemu=\"http://libvirt.org/schemas/domain/qemu/1.0\"><name>5cdd670f-ac55-4820-a66e-b6e3985e1520</name><uuid>dd053ff0-5e12-44f5-9b97-1826715"...,
    persist_xml=persist_xml@entry=0x7f4c5006f720 "<?xml
version=\"1.0\" encoding=\"utf-8\"?>\n<domain type=\"kvm\"
xmlns:qemu=\"http://libvirt.org/schemas/domain/qemu/1.0\"><name>5cdd670f-ac55-4820-a66e-b6e3985e1520</name><uuid>dd053ff0-5e12-44f5-9b97-1826715"...,
dconnuri=0x7f4c5000f840 "qemu+tls://172.16.170.52/system?no_verify=1",
uri=0x7f4c501a1430 "tcp://172.16.170.52", graphicsuri=0x0,
    listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0,
migParams=0x7f4c5002b540, cookiein=0x0, cookieinlen=0,
cookieout=0x7f4c8487e8c8, cookieoutlen=0x7f4c8487e8bc,
flags=1073885723,
    dname=0x0, resource=0, v3proto=<optimized out>) at
../../src/qemu/qemu_migration.c:4842
#14 0x00007f4c74d44c6c in qemuMigrationSrcPerform
(driver=driver@entry=0x7f4c3814e4b0, conn=0x7f4c0000fb70,
vm=0x7f4c38257de0,
    xmlin=0x7f4c50026f80 "<?xml version=\"1.0\"
encoding=\"utf-8\"?>\n<domain type=\"kvm\"
xmlns:qemu=\"http://libvirt.org/schemas/domain/qemu/1.0\"><name>5cdd670f-ac55-4820-a66e-b6e3985e1520</name><uuid>dd053ff0-5e12-44f5-9b97-1826715"...,
    persist_xml=0x7f4c5006f720 "<?xml version=\"1.0\"
encoding=\"utf-8\"?>\n<domain type=\"kvm\"
xmlns:qemu=\"http://libvirt.org/schemas/domain/qemu/1.0\"><name>5cdd670f-ac55-4820-a66e-b6e3985e1520</name><uuid>dd053ff0-5e12-44f5-9b97-1826715"...,
dconnuri=dconnuri@entry=0x7f4c5000f840
"qemu+tls://172.16.170.52/system?no_verify=1", uri=0x7f4c501a1430
"tcp://172.16.170.52", graphicsuri=0x0,
    listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0,
migParams=0x7f4c5002b540, cookiein=0x0, cookieinlen=0,
cookieout=0x7f4c8487e8c8, cookieoutlen=0x7f4c8487e8bc,
flags=1073885723,
    dname=0x0, resource=0, v3proto=true) at ../../src/qemu/qemu_migration.c:5030
#15 0x00007f4c74d769e0 in qemuDomainMigratePerform3Params
(dom=0x7f4c5019bfe0, dconnuri=0x7f4c5000f840
"qemu+tls://172.16.170.52/system?no_verify=1", params=<optimized out>,
nparams=<optimized out>,
    cookiein=0x0, cookieinlen=0, cookieout=0x7f4c8487e8c8,
cookieoutlen=0x7f4c8487e8bc, flags=1073885723) at
../../src/qemu/qemu_driver.c:12730
#16 0x00007f4c94b072e8 in virDomainMigratePerform3Params
(domain=domain@entry=0x7f4c5019bfe0, dconnuri=0x7f4c5000f840
"qemu+tls://172.16.170.52/system?no_verify=1", params=0x7f4c500926d0,
nparams=4,
    cookiein=0x0, cookieinlen=0, cookieout=0x7f4c8487e8c8,
cookieoutlen=0x7f4c8487e8bc, flags=1073885723) at
../../src/libvirt-domain.c:4989
#17 0x000055b881c3fb1e in remoteDispatchDomainMigratePerform3Params
(server=0x55b881dbba70, msg=0x55b881df23d0, ret=0x7f4c50054210,
args=0x7f4c5019ff10, rerr=0x7f4c8487e9c0, client=<optimized out>)
    at ../../src/remote/remote_daemon_dispatch.c:5736
#18 remoteDispatchDomainMigratePerform3ParamsHelper
(server=0x55b881dbba70, client=<optimized out>, msg=0x55b881df23d0,
rerr=0x7f4c8487e9c0, args=0x7f4c5019ff10, ret=0x7f4c50054210)
    at ./remote/remote_daemon_dispatch_stubs.h:8805
--Type <RET> for more, q to quit, c to continue without paging--
#19 0x00007f4c94aa242d in virNetServerProgramDispatchCall
(msg=0x55b881df23d0, client=0x55b881e0f740, server=0x55b881dbba70,
prog=0x55b881dc8750) at ../../src/rpc/virnetserverprogram.c:430
#20 virNetServerProgramDispatch (prog=0x55b881dc8750,
server=server@entry=0x55b881dbba70, client=0x55b881e0f740,
msg=0x55b881df23d0) at ../../src/rpc/virnetserverprogram.c:302
#21 0x00007f4c94aa73c2 in virNetServerProcessMsg (msg=<optimized out>,
prog=<optimized out>, client=<optimized out>, srv=0x55b881dbba70) at
../../src/rpc/virnetserver.c:137
#22 virNetServerHandleJob (jobOpaque=0x55b881de8140,
opaque=0x55b881dbba70) at ../../src/rpc/virnetserver.c:154
#23 0x00007f4c949bbf80 in virThreadPoolWorker (opaque=<optimized out>)
at ../../src/util/virthreadpool.c:163
#24 0x00007f4c949bb5b7 in virThreadHelper (data=<optimized out>) at
../../src/util/virthread.c:233
#25 0x00007f4c93dfbf1b in start_thread (arg=0x7f4c8487f700) at
pthread_create.c:486
#26 0x00007f4c93d131a0 in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:98


While at the destination side, Libvirt shows a "paused" VM:

$ virsh list
 Id   Name                                   State
-----------------------------------------------------
 31   fdecd242-f278-4308-8c3b-46e144e55f63   paused

Libvirt is stuck with the following backtrace. It means Libvirt is querying
the VM status by issuing the QMP "query-status" before killing the VM. The
piece of code is:

qemuMigrationDstFinish:
    if (retcode != 0) {
        /* Check for a possible error on the monitor in case Finish was called
         * earlier than monitor EOF handler got a chance to process the error
         */
        qemuDomainCheckMonitor(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN);
        goto endjob;
    }

Thread 2 (Thread 0x7f1161c6c700 (LWP 3244)):
#0  0x00007f116f9eba0c in futex_wait_cancelable (private=<optimized
out>, expected=0, futex_word=0x7f1138068550) at
../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0,
mutex=0x7f1138068500, cond=0x7f1138068528) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=cond@entry=0x7f1138068528,
mutex=mutex@entry=0x7f1138068500) at pthread_cond_wait.c:638
#3  0x00007f11705a5476 in virCondWait (c=c@entry=0x7f1138068528,
m=m@entry=0x7f1138068500) at ../../src/util/virthread.c:148
#4  0x00007f116013fbfc in qemuMonitorSend
(mon=mon@entry=0x7f11380684f0, msg=msg@entry=0x7f1161c6b600) at
../../src/qemu/qemu_monitor.c:953
#5  0x00007f116014fde5 in qemuMonitorJSONCommandWithFd
(mon=mon@entry=0x7f11380684f0, cmd=cmd@entry=0x7f115c0512e0,
scm_fd=scm_fd@entry=-1, reply=reply@entry=0x7f1161c6b680) at
../../src/qemu/qemu_monitor_json.c:358
#6  0x00007f1160152025 in qemuMonitorJSONCommand
(reply=0x7f1161c6b680, cmd=0x7f115c0512e0, mon=0x7f11380684f0) at
../../src/qemu/qemu_monitor_json.c:383
#7  qemuMonitorJSONGetStatus (mon=0x7f11380684f0,
running=0x7f1161c6b6c7, reason=0x0) at
../../src/qemu/qemu_monitor_json.c:1740
#8  0x00007f1160141a80 in qemuMonitorCheck (mon=<optimized out>) at
../../src/qemu/qemu_monitor.c:1633
#9  0x00007f11600f0d87 in qemuDomainCheckMonitor
(driver=driver@entry=0x7f11141273b0, vm=vm@entry=0x7f1138135920,
asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_MIGRATION_IN) at
../../src/qemu/qemu_domain.c:14393
#10 0x00007f1160133d18 in qemuMigrationDstFinish
(driver=driver@entry=0x7f11141273b0, dconn=dconn@entry=0x7f1134012000,
vm=<optimized out>, cookiein=cookiein@entry=0x0,
cookieinlen=cookieinlen@entry=0,
cookieout=cookieout@entry=0x7f1161c6b8f0, cookieoutlen=0x7f1161c6b8e4,
flags=131611, retcode=1, v3proto=true) at
../../src/qemu/qemu_migration.c:5211
#11 0x00007f116016a436 in qemuDomainMigrateFinish3Params
(dconn=0x7f1134012000, params=0x7f115c05e9e0, nparams=4, cookiein=0x0,
cookieinlen=0, cookieout=0x7f1161c6b8f0, cookieoutlen=0x7f1161c6b8e4,
flags=131611, cancelled=1) at ../../src/qemu/qemu_driver.c:12827
#12 0x00007f11706f15bb in virDomainMigrateFinish3Params
(dconn=<optimized out>, params=0x7f115c05e9e0, nparams=4,
cookiein=0x0, cookieinlen=0, cookieout=0x7f1161c6b8f0,
cookieoutlen=0x7f1161c6b8e4, flags=131611, cancelled=1) at
../../src/libvirt-domain.c:5033
#13 0x000055ffa89cf8c0 in ?? ()
#14 0x00007f117068c42d in virNetServerProgramDispatchCall
(msg=0x55ffaa6a3ec0, client=0x55ffaa6b4840, server=0x55ffaa682030,
prog=0x55ffaa68f760) at ../../src/rpc/virnetserverprogram.c:430
#15 virNetServerProgramDispatch (prog=0x55ffaa68f760,
server=server@entry=0x55ffaa682030, client=0x55ffaa6b4840,
msg=0x55ffaa6a3ec0) at ../../src/rpc/virnetserverprogram.c:302
#16 0x00007f11706913c2 in virNetServerProcessMsg (msg=<optimized out>,
prog=<optimized out>, client=<optimized out>, srv=0x55ffaa682030) at
../../src/rpc/virnetserver.c:137
#17 virNetServerHandleJob (jobOpaque=0x55ffaa669af0,
opaque=0x55ffaa682030) at ../../src/rpc/virnetserver.c:154
#18 0x00007f11705a5f80 in virThreadPoolWorker (opaque=<optimized out>)
at ../../src/util/virthreadpool.c:163
#19 0x00007f11705a55b7 in virThreadHelper (data=<optimized out>) at
../../src/util/virthread.c:233
#20 0x00007f116f9e5f1b in start_thread (arg=0x7f1161c6c700) at
pthread_create.c:486
#21 0x00007f116f8fd1a0 in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:98
Thread 1 (Thread 0x7f116e9a1580 (LWP 2925)):
#0  0x00007f116f8f26c9 in __GI___poll (fds=0x55ffaa65f130, nfds=14,
timeout=4982) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f116fa83379 in ?? () from /lib64/libglib-2.0.so.0
#2  0x00007f116fa8348c in g_main_context_iteration () from
/lib64/libglib-2.0.so.0
#3  0x00007f117054cdc0 in virEventGLibRunOnce () at
../../src/util/vireventglib.c:533
#4  0x00007f117054c085 in virEventRunDefaultImpl () at
../../src/util/virevent.c:344
#5  0x00007f1170690bcd in virNetDaemonRun (dmn=0x55ffaa680d60) at
../../src/rpc/virnetdaemon.c:852
#6  0x000055ffa89c03bc in ?? ()
#7  0x00007f116f82ab27 in __libc_start_main (main=0x55ffa89be930,
argc=2, argv=0x7ffe19beea78, init=<optimized out>, fini=<optimized
out>, rtld_fini=<optimized out>, stack_end=0x7ffe19beea68) at
../csu/libc-start.c:308
#8  0x000055ffa89c06ba in ?? ()

IMHO, the key reason for the issue is that QEMU fails to run the main loop
and fails to respond to QMP, which is not what we usually expected.

Giving the Libvirt a window of time to issue a QMP and kill the VM is the
ideal solution for this issue; this provides an automatic method.

I do not dig the yank feature, perhaps it is helpful, but only manually?

After all, these two options are not exclusive of one another,  I think.


>
> Best regards,
> Lukas Straub
>

Thanks,
Yong

-- 
Best regards

[-- Attachment #2: Type: text/html, Size: 23962 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-07  2:41 [PATCH] multifd: Make the main thread yield periodically to the main loop yong.huang
  2025-08-07  9:32 ` Lukas Straub
  2025-08-07  9:36 ` Lukas Straub
@ 2025-08-08  6:36 ` Yong Huang
  2025-08-08 15:42 ` Peter Xu
  2025-08-19 10:19 ` Daniel P. Berrangé
  4 siblings, 0 replies; 20+ messages in thread
From: Yong Huang @ 2025-08-08  6:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas

[-- Attachment #1: Type: text/plain, Size: 6743 bytes --]

On Thu, Aug 7, 2025 at 11:04 AM <yong.huang@smartx.com> wrote:

> From: Hyman Huang <yong.huang@smartx.com>
>
> When there are network issues like missing TCP ACKs on the send
> side during the multifd live migration. At the send side, the error
> "Connection timed out" is thrown out and source QEMU process stop
> sending data, at the receive side, The IO-channels may be blocked
> at recvmsg() and thus the main loop gets stuck and fails to respond
> to QMP commands consequently.
>
> The QEMU backtrace at the receive side with the main thread and two
> multi-channel threads is displayed as follows:
>
> multifd thread 2:
> Thread 10 (Thread 0x7fd24d5fd700 (LWP 1413634)):
> 0  0x00007fd46066d157 in __libc_recvmsg (fd=46, msg=msg@entry=0x7fd24d5fc530,
> flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recvmsg.c:28
> 1  0x00005556d52ffb1b in qio_channel_socket_readv (ioc=<optimized out>,
> iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0,
> flags=<optimized out>, errp=0x7fd24d5fc6f8) at ../io/channel-socket.c:513
> 2  0x00005556d530561f in qio_channel_readv_full_all_eof
> (ioc=0x5556d76db290, iov=<optimized out>, niov=<optimized out>, fds=0x0,
> nfds=0x0, errp=errp@entry=0x7fd24d5fc6f8) at ../io/channel.c:142
> 3  0x00005556d53057d9 in qio_channel_readv_full_all (ioc=<optimized out>,
> iov=<optimized out>, niov=<optimized out>, fds=<optimized out>,
> nfds=<optimized out>, errp=0x7fd24d5fc6f8) at ../io/channel.c:210
> 4  0x00005556d4fa4fc9 in multifd_recv_thread (opaque=opaque@entry=0x5556d7affa60)
> at ../migration/multifd.c:1113
> 5  0x00005556d5414826 in qemu_thread_start (args=<optimized out>) at
> ../util/qemu-thread-posix.c:556
> 6  0x00007fd460662f1b in start_thread (arg=0x7fd24d5fd700) at
> pthread_create.c:486
> 7  0x00007fd46059a1a0 in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:98
>
> multifd thread 1:
> Thread 9 (Thread 0x7fd24ddfe700 (LWP 1413633)):
> 0  0x00007fd46066d157 in __libc_recvmsg (fd=44, msg=msg@entry=0x7fd24ddfd530,
> flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recvmsg.c:28
> 1  0x00005556d52ffb1b in qio_channel_socket_readv (ioc=<optimized out>,
> iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0,
> flags=<optimized out>, errp=0x7fd24ddfd6f8) at ../io/channel-socket.c:513
> 2  0x00005556d530561f in qio_channel_readv_full_all_eof
> (ioc=0x5556d76dc600, iov=<optimized out>, niov=<optimized out>, fds=0x0,
> nfds=0x0, errp=errp@entry=0x7fd24ddfd6f8) at ../io/channel.c:142
> 3  0x00005556d53057d9 in qio_channel_readv_full_all (ioc=<optimized out>,
> iov=<optimized out>, niov=<optimized out>, fds=<optimized out>,
> nfds=<optimized out>, errp=0x7fd24ddfd6f8) at ../io/channel.c:210
> 4  0x00005556d4fa4fc9 in multifd_recv_thread (opaque=opaque@entry=0x5556d7aff990)
> at ../migration/multifd.c:1113
> 5  0x00005556d5414826 in qemu_thread_start (args=<optimized out>) at
> ../util/qemu-thread-posix.c:556
> 6  0x00007fd460662f1b in start_thread (arg=0x7fd24ddfe700) at
> pthread_create.c:486
> 7  0x00007fd46059a1a0 in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:98
>
> main thread:
> Thread 1 (Thread 0x7fd45f1fbe40 (LWP 1413088)):
> 0  0x00007fd46066b616 in futex_abstimed_wait_cancelable (private=0,
> abstime=0x0, clockid=0, expected=0, futex_word=0x5556d7604e80) at
> ../sysdeps/unix/sysv/linux/futex-internal.h:216
> 1  do_futex_wait (sem=sem@entry=0x5556d7604e80, abstime=0x0) at
> sem_waitcommon.c:111
> 2  0x00007fd46066b708 in __new_sem_wait_slow (sem=sem@entry=0x5556d7604e80,
> abstime=0x0) at sem_waitcommon.c:183
> 3  0x00007fd46066b779 in __new_sem_wait (sem=sem@entry=0x5556d7604e80) at
> sem_wait.c:42
> 4  0x00005556d5415524 in qemu_sem_wait (sem=0x5556d7604e80) at
> ../util/qemu-thread-posix.c:358
> 5  0x00005556d4fa5e99 in multifd_recv_sync_main () at
> ../migration/multifd.c:1052
> 6  0x00005556d521ed65 in ram_load_precopy (f=f@entry=0x5556d75dfb90) at
> ../migration/ram.c:4446
> 7  0x00005556d521f1dd in ram_load (f=0x5556d75dfb90, opaque=<optimized
> out>, version_id=4) at ../migration/ram.c:4495
> 8  0x00005556d4faa3e7 in vmstate_load (f=f@entry=0x5556d75dfb90,
> se=se@entry=0x5556d6083070) at ../migration/savevm.c:909
> 9  0x00005556d4fae7a0 in qemu_loadvm_section_part_end (mis=0x5556d6082cc0,
> f=0x5556d75dfb90) at ../migration/savevm.c:2475
> 10 qemu_loadvm_state_main (f=f@entry=0x5556d75dfb90, mis=mis@entry=0x5556d6082cc0)
> at ../migration/savevm.c:2634
> 11 0x00005556d4fafbd5 in qemu_loadvm_state (f=0x5556d75dfb90) at
> ../migration/savevm.c:2706
> 12 0x00005556d4f9ebdb in process_incoming_migration_co (opaque=<optimized
> out>) at ../migration/migration.c:561
> 13 0x00005556d542513b in coroutine_trampoline (i0=<optimized out>,
> i1=<optimized out>) at ../util/coroutine-ucontext.c:186
> 14 0x00007fd4604ef970 in ?? () from target:/lib64/libc.so.6
>
> Once the QEMU process falls into the above state in the presence of
> the network errors, live migration cannot be canceled gracefully,
> leaving the destination VM in the "paused" state, since the QEMU
> process on the destination side doesn't respond to the QMP command
> "migrate_cancel".


Actually, in our case, QEMU on the destination side doesn't respond to
the QMP command "query-status" instead of "migrate-cancel".
See the details in the mail that was replied to Lukas.

It is my mistake for not checking the comment, I'll fix the comment in
the next version. :(


>
> To fix that, make the main thread yield to the main loop after waiting
> too long for the multi-channels to finish receiving data during one
> iteration. 10 seconds is a sufficient timeout period to set.
>
> Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> ---
>  migration/multifd.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index b255778855..aca0aeb341 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1228,6 +1228,16 @@ void multifd_recv_sync_main(void)
>              }
>          }
>          trace_multifd_recv_sync_main_signal(p->id);
> +        do {
> +            if (qemu_sem_timedwait(&multifd_recv_state->sem_sync, 10000)
> == 0) {
> +                break;
> +            }
> +            if (qemu_in_coroutine()) {
> +                aio_co_schedule(qemu_get_current_aio_context(),
> +                                qemu_coroutine_self());
> +                qemu_coroutine_yield();
> +            }
> +        } while (1);
>          qemu_sem_post(&p->sem_sync);
>      }
>      trace_multifd_recv_sync_main(multifd_recv_state->packet_num);
> --
> 2.27.0
>
> Thanks,
Yong

-- 
Best regards

[-- Attachment #2: Type: text/html, Size: 9142 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-08  2:36   ` Yong Huang
@ 2025-08-08  7:01     ` Lukas Straub
  2025-08-08  8:02       ` Yong Huang
  0 siblings, 1 reply; 20+ messages in thread
From: Lukas Straub @ 2025-08-08  7:01 UTC (permalink / raw)
  To: Yong Huang; +Cc: qemu-devel, Peter Xu, Fabiano Rosas

[-- Attachment #1: Type: text/plain, Size: 3159 bytes --]

On Fri, 8 Aug 2025 10:36:24 +0800
Yong Huang <yong.huang@smartx.com> wrote:

> On Thu, Aug 7, 2025 at 5:36 PM Lukas Straub <lukasstraub2@web.de> wrote:
> 
> > On Thu,  7 Aug 2025 10:41:17 +0800
> > yong.huang@smartx.com wrote:
> >  
> > > From: Hyman Huang <yong.huang@smartx.com>
> > >
> > > When there are network issues like missing TCP ACKs on the send
> > > side during the multifd live migration. At the send side, the error
> > > "Connection timed out" is thrown out and source QEMU process stop
> > > sending data, at the receive side, The IO-channels may be blocked
> > > at recvmsg() and thus the main loop gets stuck and fails to respond
> > > to QMP commands consequently.
> > > ...  
> >
> > Hi Hyman Huang,
> >
> > Have you tried the 'yank' command to shutdown the sockets? It exactly
> > meant to recover from hangs and should solve your issue.
> >
> > https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#yank-feature  
> 
> 
> Thanks for the comment and advice.
> 
> Let me give more details about the migration state when the issue happens:
> 
> On the source side, libvirt has already aborted the migration job:
> 
> $ virsh domjobinfo fdecd242-f278-4308-8c3b-46e144e55f63
> Job type:         Failed
> Operation:        Outgoing migration
> 
> QMP query-yank shows that there is no migration yank instance:
> 
> $ virsh qemu-monitor-command fdecd242-f278-4308-8c3b-46e144e55f63
> '{"execute":"query-yank"}' --pretty
> {
>   "return": [
>     {
>       "type": "chardev",
>       "id": "charmonitor"
>     },
>     {
>       "type": "chardev",
>       "id": "charchannel0"
>     },
>     {
>       "type": "chardev",
>       "id": "libvirt-2-virtio-format"
>     }
>   ],
>   "id": "libvirt-5217"
> }

You are supposed to run it on the destination side, there the migration
yank instance should be present if qemu hangs in the migration code.

Also, you need to execute it as an out-of-band command to bypass the
main loop. Like this:

'{"exec-oob": "yank", "id": "yank0", "arguments": {"instances": [ {"type": "migration"} ] } }'

I'm not sure if libvirt can do that, maybe you need to add an
additional qmp socket and do it outside of libvirt. Note that you need
to enable the oob feature during qmp negotiation, like this:

'{ "execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }'

Regards,
Lukas Straub

> 
> The libvirt migration job is stuck as the following backtrace shows; it
> shows that migration is waiting for the "Finish" RPC on the destination
> side to return.
> 
> ...
> 
> IMHO, the key reason for the issue is that QEMU fails to run the main loop
> and fails to respond to QMP, which is not what we usually expected.
> 
> Giving the Libvirt a window of time to issue a QMP and kill the VM is the
> ideal solution for this issue; this provides an automatic method.
> 
> I do not dig the yank feature, perhaps it is helpful, but only manually?
> 
> After all, these two options are not exclusive of one another,  I think.
> 
> 
> >
> > Best regards,
> > Lukas Straub
> >  
> 
> Thanks,
> Yong
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-08  7:01     ` Lukas Straub
@ 2025-08-08  8:02       ` Yong Huang
  2025-08-08 13:55         ` Fabiano Rosas
  0 siblings, 1 reply; 20+ messages in thread
From: Yong Huang @ 2025-08-08  8:02 UTC (permalink / raw)
  To: Lukas Straub; +Cc: qemu-devel, Peter Xu, Fabiano Rosas

[-- Attachment #1: Type: text/plain, Size: 4513 bytes --]

On Fri, Aug 8, 2025 at 3:02 PM Lukas Straub <lukasstraub2@web.de> wrote:

> On Fri, 8 Aug 2025 10:36:24 +0800
> Yong Huang <yong.huang@smartx.com> wrote:
>
> > On Thu, Aug 7, 2025 at 5:36 PM Lukas Straub <lukasstraub2@web.de> wrote:
> >
> > > On Thu,  7 Aug 2025 10:41:17 +0800
> > > yong.huang@smartx.com wrote:
> > >
> > > > From: Hyman Huang <yong.huang@smartx.com>
> > > >
> > > > When there are network issues like missing TCP ACKs on the send
> > > > side during the multifd live migration. At the send side, the error
> > > > "Connection timed out" is thrown out and source QEMU process stop
> > > > sending data, at the receive side, The IO-channels may be blocked
> > > > at recvmsg() and thus the main loop gets stuck and fails to respond
> > > > to QMP commands consequently.
> > > > ...
> > >
> > > Hi Hyman Huang,
> > >
> > > Have you tried the 'yank' command to shutdown the sockets? It exactly
> > > meant to recover from hangs and should solve your issue.
> > >
> > >
> https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#yank-feature
> >
> >
> > Thanks for the comment and advice.
> >
> > Let me give more details about the migration state when the issue
> happens:
> >
> > On the source side, libvirt has already aborted the migration job:
> >
> > $ virsh domjobinfo fdecd242-f278-4308-8c3b-46e144e55f63
> > Job type:         Failed
> > Operation:        Outgoing migration
> >
> > QMP query-yank shows that there is no migration yank instance:
> >
> > $ virsh qemu-monitor-command fdecd242-f278-4308-8c3b-46e144e55f63
> > '{"execute":"query-yank"}' --pretty
> > {
> >   "return": [
> >     {
> >       "type": "chardev",
> >       "id": "charmonitor"
> >     },
> >     {
> >       "type": "chardev",
> >       "id": "charchannel0"
> >     },
> >     {
> >       "type": "chardev",
> >       "id": "libvirt-2-virtio-format"
> >     }
> >   ],
> >   "id": "libvirt-5217"
> > }
>
> You are supposed to run it on the destination side, there the migration
> yank instance should be present if qemu hangs in the migration code.
>
> Also, you need to execute it as an out-of-band command to bypass the
> main loop. Like this:
>
> '{"exec-oob": "yank", "id": "yank0", "arguments": {"instances": [ {"type":
> "migration"} ] } }'

In our case, Libvirt's operation about the VM on the destination side has
been blocked
by Migration JOB:

$ virsh qemu-monitor-command fdecd242-f278-4308-8c3b-46e144e55f63
'{"query-commands"}' --pretty
error: Timed out during operation: cannot acquire state change lock (held
by monitor=remoteDispatchDomainMigratePrepare3Params)
Using Libvirt to issue the yank command can not be taken into account.


>
>
> I'm not sure if libvirt can do that, maybe you need to add an
> additional qmp socket and do it outside of libvirt. Note that you need
> to enable the oob feature during qmp negotiation, like this:
>
> '{ "execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }'


No, I checked Libvirt's source code and figured out that when the QEMU
monitor is initialized, Libvirt by default disables the OOB.

Therefore, perhaps we can first enable the OOB and add the yank capability
to Libvirt then adding the yank logic to the necessary path—in our
instance, the migration code:

qemuMigrationDstFinish:
    if (retcode != 0) {
        /* Check for a possible error on the monitor in case Finish was called
         * earlier than monitor EOF handler got a chance to process the error
         */
        qemuDomainCheckMonitor(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN);
        goto endjob;
    }



>
> Regards,
> Lukas Straub
>
> >
> > The libvirt migration job is stuck as the following backtrace shows; it
> > shows that migration is waiting for the "Finish" RPC on the destination
> > side to return.
> >
> > ...
> >
> > IMHO, the key reason for the issue is that QEMU fails to run the main
> loop
> > and fails to respond to QMP, which is not what we usually expected.
> >
> > Giving the Libvirt a window of time to issue a QMP and kill the VM is the
> > ideal solution for this issue; this provides an automatic method.
> >
> > I do not dig the yank feature, perhaps it is helpful, but only manually?
> >
> > After all, these two options are not exclusive of one another,  I think.
> >
> >
> > >
> > > Best regards,
> > > Lukas Straub
> > >
> >
> > Thanks,
> > Yong
> >
>
>

-- 
Best regards

[-- Attachment #2: Type: text/html, Size: 8820 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-08  8:02       ` Yong Huang
@ 2025-08-08 13:55         ` Fabiano Rosas
  2025-08-08 15:37           ` Peter Xu
  2025-08-11  2:27           ` Yong Huang
  0 siblings, 2 replies; 20+ messages in thread
From: Fabiano Rosas @ 2025-08-08 13:55 UTC (permalink / raw)
  To: Yong Huang, Lukas Straub; +Cc: qemu-devel, Peter Xu

Yong Huang <yong.huang@smartx.com> writes:

> On Fri, Aug 8, 2025 at 3:02 PM Lukas Straub <lukasstraub2@web.de> wrote:
>
>> On Fri, 8 Aug 2025 10:36:24 +0800
>> Yong Huang <yong.huang@smartx.com> wrote:
>>
>> > On Thu, Aug 7, 2025 at 5:36 PM Lukas Straub <lukasstraub2@web.de> wrote:
>> >
>> > > On Thu,  7 Aug 2025 10:41:17 +0800
>> > > yong.huang@smartx.com wrote:
>> > >
>> > > > From: Hyman Huang <yong.huang@smartx.com>
>> > > >
>> > > > When there are network issues like missing TCP ACKs on the send
>> > > > side during the multifd live migration. At the send side, the error
>> > > > "Connection timed out" is thrown out and source QEMU process stop
>> > > > sending data, at the receive side, The IO-channels may be blocked
>> > > > at recvmsg() and thus the main loop gets stuck and fails to respond
>> > > > to QMP commands consequently.
>> > > > ...
>> > >
>> > > Hi Hyman Huang,
>> > >
>> > > Have you tried the 'yank' command to shutdown the sockets? It exactly
>> > > meant to recover from hangs and should solve your issue.
>> > >
>> > >
>> https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#yank-feature
>> >
>> >
>> > Thanks for the comment and advice.
>> >
>> > Let me give more details about the migration state when the issue
>> happens:
>> >
>> > On the source side, libvirt has already aborted the migration job:
>> >
>> > $ virsh domjobinfo fdecd242-f278-4308-8c3b-46e144e55f63
>> > Job type:         Failed
>> > Operation:        Outgoing migration
>> >
>> > QMP query-yank shows that there is no migration yank instance:
>> >
>> > $ virsh qemu-monitor-command fdecd242-f278-4308-8c3b-46e144e55f63
>> > '{"execute":"query-yank"}' --pretty
>> > {
>> >   "return": [
>> >     {
>> >       "type": "chardev",
>> >       "id": "charmonitor"
>> >     },
>> >     {
>> >       "type": "chardev",
>> >       "id": "charchannel0"
>> >     },
>> >     {
>> >       "type": "chardev",
>> >       "id": "libvirt-2-virtio-format"
>> >     }
>> >   ],
>> >   "id": "libvirt-5217"
>> > }
>>
>> You are supposed to run it on the destination side, there the migration
>> yank instance should be present if qemu hangs in the migration code.
>>
>> Also, you need to execute it as an out-of-band command to bypass the
>> main loop. Like this:
>>
>> '{"exec-oob": "yank", "id": "yank0", "arguments": {"instances": [ {"type":
>> "migration"} ] } }'
>
> In our case, Libvirt's operation about the VM on the destination side has
> been blocked
> by Migration JOB:
>
> $ virsh qemu-monitor-command fdecd242-f278-4308-8c3b-46e144e55f63
> '{"query-commands"}' --pretty
> error: Timed out during operation: cannot acquire state change lock (held
> by monitor=remoteDispatchDomainMigratePrepare3Params)
> Using Libvirt to issue the yank command can not be taken into account.
>
>
>>
>>
>> I'm not sure if libvirt can do that, maybe you need to add an
>> additional qmp socket and do it outside of libvirt. Note that you need
>> to enable the oob feature during qmp negotiation, like this:
>>
>> '{ "execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }'
>
>
> No, I checked Libvirt's source code and figured out that when the QEMU
> monitor is initialized, Libvirt by default disables the OOB.
>
> Therefore, perhaps we can first enable the OOB and add the yank capability
> to Libvirt then adding the yank logic to the necessary path—in our
> instance, the migration code:
>
> qemuMigrationDstFinish:
>     if (retcode != 0) {
>         /* Check for a possible error on the monitor in case Finish was called
>          * earlier than monitor EOF handler got a chance to process the error
>          */
>         qemuDomainCheckMonitor(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN);
>         goto endjob;
>     }
>
>
>
>>
>> Regards,
>> Lukas Straub
>>
>> >
>> > The libvirt migration job is stuck as the following backtrace shows; it
>> > shows that migration is waiting for the "Finish" RPC on the destination
>> > side to return.
>> >
>> > ...
>> >
>> > IMHO, the key reason for the issue is that QEMU fails to run the main
>> loop
>> > and fails to respond to QMP, which is not what we usually expected.
>> >
>> > Giving the Libvirt a window of time to issue a QMP and kill the VM is the
>> > ideal solution for this issue; this provides an automatic method.
>> >
>> > I do not dig the yank feature, perhaps it is helpful, but only manually?
>> >
>> > After all, these two options are not exclusive of one another,  I think.
>> >

Please work with Lukas to figure out whether yank can be used here. I
think that's the correct approach. If the main loop is blocked, then
some out-of-band cancellation routine is needed. migrate_cancel() could
be it, but at the moment it's not. Yank is the second best thing.

The need for a timeout is usually indicative of a design issue. In this
case, the choice of a coroutine for the incoming side is the obvious
one. Peter will tell you all about it! =)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-08 13:55         ` Fabiano Rosas
@ 2025-08-08 15:37           ` Peter Xu
  2025-08-11  2:25             ` Yong Huang
  2025-08-11  7:03             ` Lukas Straub
  2025-08-11  2:27           ` Yong Huang
  1 sibling, 2 replies; 20+ messages in thread
From: Peter Xu @ 2025-08-08 15:37 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: Yong Huang, Lukas Straub, qemu-devel

On Fri, Aug 08, 2025 at 10:55:25AM -0300, Fabiano Rosas wrote:
> Please work with Lukas to figure out whether yank can be used here. I
> think that's the correct approach. If the main loop is blocked, then
> some out-of-band cancellation routine is needed. migrate_cancel() could
> be it, but at the moment it's not. Yank is the second best thing.

I agree.

migrate_cancel() should really be an OOB command..  It should be a superset
of yank features, plus anything migration speficic besides yanking the
channels, for example, when migration thread is blocked in PRE_SWITCHOVER.

I'll add this into my todo; maybe I can do something with it this release.
I'm happy if anyone would beat me to it.

> 
> The need for a timeout is usually indicative of a design issue. In this
> case, the choice of a coroutine for the incoming side is the obvious
> one. Peter will tell you all about it! =)

Nah. :)

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-07  2:41 [PATCH] multifd: Make the main thread yield periodically to the main loop yong.huang
                   ` (2 preceding siblings ...)
  2025-08-08  6:36 ` Yong Huang
@ 2025-08-08 15:42 ` Peter Xu
  2025-08-11  2:02   ` Yong Huang
  2025-08-19 10:19 ` Daniel P. Berrangé
  4 siblings, 1 reply; 20+ messages in thread
From: Peter Xu @ 2025-08-08 15:42 UTC (permalink / raw)
  To: yong.huang; +Cc: qemu-devel, Fabiano Rosas

On Thu, Aug 07, 2025 at 10:41:17AM +0800, yong.huang@smartx.com wrote:
> diff --git a/migration/multifd.c b/migration/multifd.c
> index b255778855..aca0aeb341 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1228,6 +1228,16 @@ void multifd_recv_sync_main(void)
>              }
>          }
>          trace_multifd_recv_sync_main_signal(p->id);
> +        do {
> +            if (qemu_sem_timedwait(&multifd_recv_state->sem_sync, 10000) == 0) {
> +                break;
> +            }
> +            if (qemu_in_coroutine()) {
> +                aio_co_schedule(qemu_get_current_aio_context(),
> +                                qemu_coroutine_self());
> +                qemu_coroutine_yield();
> +            }
> +        } while (1);

I still think either yank or fixing migrate_cancel is the way to go, but
when staring at this change.. I don't think I understand this patch at all.

It timedwait()s on the sem_sync that we just consumed.  Do you at least
need to remove the ones above this piece of code to not hang forever?

    for (i = 0; i < thread_count; i++) {
        trace_multifd_recv_sync_main_wait(i);
        qemu_sem_wait(&multifd_recv_state->sem_sync);
    }

>          qemu_sem_post(&p->sem_sync);
>      }
>      trace_multifd_recv_sync_main(multifd_recv_state->packet_num);
> -- 
> 2.27.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-08 15:42 ` Peter Xu
@ 2025-08-11  2:02   ` Yong Huang
  0 siblings, 0 replies; 20+ messages in thread
From: Yong Huang @ 2025-08-11  2:02 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas

[-- Attachment #1: Type: text/plain, Size: 1691 bytes --]

On Fri, Aug 8, 2025 at 11:42 PM Peter Xu <peterx@redhat.com> wrote:

> On Thu, Aug 07, 2025 at 10:41:17AM +0800, yong.huang@smartx.com wrote:
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index b255778855..aca0aeb341 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -1228,6 +1228,16 @@ void multifd_recv_sync_main(void)
> >              }
> >          }
> >          trace_multifd_recv_sync_main_signal(p->id);
> > +        do {
> > +            if (qemu_sem_timedwait(&multifd_recv_state->sem_sync,
> 10000) == 0) {
> > +                break;
> > +            }
> > +            if (qemu_in_coroutine()) {
> > +                aio_co_schedule(qemu_get_current_aio_context(),
> > +                                qemu_coroutine_self());
> > +                qemu_coroutine_yield();
> > +            }
> > +        } while (1);
>
> I still think either yank or fixing migrate_cancel is the way to go, but
> when staring at this change.. I don't think I understand this patch at all.
>
> It timedwait()s on the sem_sync that we just consumed.  Do you at least
> need to remove the ones above this piece of code to not hang forever?
>

Yes, thanks for pointing that out, I missed that since this patch is
cherry-picked manually from QEMU 6.2.0. :(


>
>     for (i = 0; i < thread_count; i++) {
>         trace_multifd_recv_sync_main_wait(i);
>         qemu_sem_wait(&multifd_recv_state->sem_sync);
>     }
>
> >          qemu_sem_post(&p->sem_sync);
> >      }
> >      trace_multifd_recv_sync_main(multifd_recv_state->packet_num);
> > --
> > 2.27.0
> >
>
> --
> Peter Xu
>
>

-- 
Best regards

[-- Attachment #2: Type: text/html, Size: 3088 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-08 15:37           ` Peter Xu
@ 2025-08-11  2:25             ` Yong Huang
  2025-08-11  7:03             ` Lukas Straub
  1 sibling, 0 replies; 20+ messages in thread
From: Yong Huang @ 2025-08-11  2:25 UTC (permalink / raw)
  To: Peter Xu; +Cc: Fabiano Rosas, Lukas Straub, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]

On Fri, Aug 8, 2025 at 11:37 PM Peter Xu <peterx@redhat.com> wrote:

> On Fri, Aug 08, 2025 at 10:55:25AM -0300, Fabiano Rosas wrote:
> > Please work with Lukas to figure out whether yank can be used here. I
> > think that's the correct approach. If the main loop is blocked, then
> > some out-of-band cancellation routine is needed. migrate_cancel() could
> > be it, but at the moment it's not. Yank is the second best thing.
>
> I agree.
>
> migrate_cancel() should really be an OOB command..  It should be a superset
> of yank features, plus anything migration speficic besides yanking the
> channels, for example, when migration thread is blocked in PRE_SWITCHOVER.


> I'll add this into my todo; maybe I can do something with it this release.
> I'm happy if anyone would beat me to it.
>

Is there any suggestions if I can fix migrate_cancel with the
"OOB-command-way"?
Maybe these could be the preceding patchset of your work.


>
> >
> > The need for a timeout is usually indicative of a design issue. In this
> > case, the choice of a coroutine for the incoming side is the obvious
> > one. Peter will tell you all about it! =)
>
> Nah. :)
>
> --
> Peter Xu
>
>

-- 
Best regards

[-- Attachment #2: Type: text/html, Size: 2624 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-08 13:55         ` Fabiano Rosas
  2025-08-08 15:37           ` Peter Xu
@ 2025-08-11  2:27           ` Yong Huang
  1 sibling, 0 replies; 20+ messages in thread
From: Yong Huang @ 2025-08-11  2:27 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: Lukas Straub, qemu-devel, Peter Xu

[-- Attachment #1: Type: text/plain, Size: 5511 bytes --]

On Fri, Aug 8, 2025 at 9:55 PM Fabiano Rosas <farosas@suse.de> wrote:

> Yong Huang <yong.huang@smartx.com> writes:
>
> > On Fri, Aug 8, 2025 at 3:02 PM Lukas Straub <lukasstraub2@web.de> wrote:
> >
> >> On Fri, 8 Aug 2025 10:36:24 +0800
> >> Yong Huang <yong.huang@smartx.com> wrote:
> >>
> >> > On Thu, Aug 7, 2025 at 5:36 PM Lukas Straub <lukasstraub2@web.de>
> wrote:
> >> >
> >> > > On Thu,  7 Aug 2025 10:41:17 +0800
> >> > > yong.huang@smartx.com wrote:
> >> > >
> >> > > > From: Hyman Huang <yong.huang@smartx.com>
> >> > > >
> >> > > > When there are network issues like missing TCP ACKs on the send
> >> > > > side during the multifd live migration. At the send side, the
> error
> >> > > > "Connection timed out" is thrown out and source QEMU process stop
> >> > > > sending data, at the receive side, The IO-channels may be blocked
> >> > > > at recvmsg() and thus the main loop gets stuck and fails to
> respond
> >> > > > to QMP commands consequently.
> >> > > > ...
> >> > >
> >> > > Hi Hyman Huang,
> >> > >
> >> > > Have you tried the 'yank' command to shutdown the sockets? It
> exactly
> >> > > meant to recover from hangs and should solve your issue.
> >> > >
> >> > >
> >> https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#yank-feature
> >> >
> >> >
> >> > Thanks for the comment and advice.
> >> >
> >> > Let me give more details about the migration state when the issue
> >> happens:
> >> >
> >> > On the source side, libvirt has already aborted the migration job:
> >> >
> >> > $ virsh domjobinfo fdecd242-f278-4308-8c3b-46e144e55f63
> >> > Job type:         Failed
> >> > Operation:        Outgoing migration
> >> >
> >> > QMP query-yank shows that there is no migration yank instance:
> >> >
> >> > $ virsh qemu-monitor-command fdecd242-f278-4308-8c3b-46e144e55f63
> >> > '{"execute":"query-yank"}' --pretty
> >> > {
> >> >   "return": [
> >> >     {
> >> >       "type": "chardev",
> >> >       "id": "charmonitor"
> >> >     },
> >> >     {
> >> >       "type": "chardev",
> >> >       "id": "charchannel0"
> >> >     },
> >> >     {
> >> >       "type": "chardev",
> >> >       "id": "libvirt-2-virtio-format"
> >> >     }
> >> >   ],
> >> >   "id": "libvirt-5217"
> >> > }
> >>
> >> You are supposed to run it on the destination side, there the migration
> >> yank instance should be present if qemu hangs in the migration code.
> >>
> >> Also, you need to execute it as an out-of-band command to bypass the
> >> main loop. Like this:
> >>
> >> '{"exec-oob": "yank", "id": "yank0", "arguments": {"instances": [
> {"type":
> >> "migration"} ] } }'
> >
> > In our case, Libvirt's operation about the VM on the destination side has
> > been blocked
> > by Migration JOB:
> >
> > $ virsh qemu-monitor-command fdecd242-f278-4308-8c3b-46e144e55f63
> > '{"query-commands"}' --pretty
> > error: Timed out during operation: cannot acquire state change lock (held
> > by monitor=remoteDispatchDomainMigratePrepare3Params)
> > Using Libvirt to issue the yank command can not be taken into account.
> >
> >
> >>
> >>
> >> I'm not sure if libvirt can do that, maybe you need to add an
> >> additional qmp socket and do it outside of libvirt. Note that you need
> >> to enable the oob feature during qmp negotiation, like this:
> >>
> >> '{ "execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] }
> }'
> >
> >
> > No, I checked Libvirt's source code and figured out that when the QEMU
> > monitor is initialized, Libvirt by default disables the OOB.
> >
> > Therefore, perhaps we can first enable the OOB and add the yank
> capability
> > to Libvirt then adding the yank logic to the necessary path—in our
> > instance, the migration code:
> >
> > qemuMigrationDstFinish:
> >     if (retcode != 0) {
> >         /* Check for a possible error on the monitor in case Finish was
> called
> >          * earlier than monitor EOF handler got a chance to process the
> error
> >          */
> >         qemuDomainCheckMonitor(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN);
> >         goto endjob;
> >     }
> >
> >
> >
> >>
> >> Regards,
> >> Lukas Straub
> >>
> >> >
> >> > The libvirt migration job is stuck as the following backtrace shows;
> it
> >> > shows that migration is waiting for the "Finish" RPC on the
> destination
> >> > side to return.
> >> >
> >> > ...
> >> >
> >> > IMHO, the key reason for the issue is that QEMU fails to run the main
> >> loop
> >> > and fails to respond to QMP, which is not what we usually expected.
> >> >
> >> > Giving the Libvirt a window of time to issue a QMP and kill the VM is
> the
> >> > ideal solution for this issue; this provides an automatic method.
> >> >
> >> > I do not dig the yank feature, perhaps it is helpful, but only
> manually?
> >> >
> >> > After all, these two options are not exclusive of one another,  I
> think.
> >> >
>
> Please work with Lukas to figure out whether yank can be used here. I
> think that's the correct approach. If the main loop is blocked, then
> some out-of-band cancellation routine is needed. migrate_cancel() could
> be it, but at the moment it's not. Yank is the second best thing.


Ok, get it.


>
>
> The need for a timeout is usually indicative of a design issue. In this
> case, the choice of a coroutine for the incoming side is the obvious
> one. Peter will tell you all about it! =)
>


-- 
Best regards

[-- Attachment #2: Type: text/html, Size: 8745 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-08 15:37           ` Peter Xu
  2025-08-11  2:25             ` Yong Huang
@ 2025-08-11  7:03             ` Lukas Straub
  2025-08-11 13:53               ` Fabiano Rosas
  1 sibling, 1 reply; 20+ messages in thread
From: Lukas Straub @ 2025-08-11  7:03 UTC (permalink / raw)
  To: Peter Xu; +Cc: Fabiano Rosas, Yong Huang, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1343 bytes --]

On Fri, 8 Aug 2025 11:37:23 -0400
Peter Xu <peterx@redhat.com> wrote:

> On Fri, Aug 08, 2025 at 10:55:25AM -0300, Fabiano Rosas wrote:
> > Please work with Lukas to figure out whether yank can be used here. I
> > think that's the correct approach. If the main loop is blocked, then
> > some out-of-band cancellation routine is needed. migrate_cancel() could
> > be it, but at the moment it's not. Yank is the second best thing.  
> 
> I agree.
> 
> migrate_cancel() should really be an OOB command..  It should be a superset
> of yank features, plus anything migration speficic besides yanking the
> channels, for example, when migration thread is blocked in PRE_SWITCHOVER.

Hmm, I think the migration code should handle this properly even if the
yank command is used. From the POV of migration, it sees that the
connection broke with connection reset. That is the same error as if the
other side crashes/is killed or a NAT/stateful firewall in between
reboots.

> 
> I'll add this into my todo; maybe I can do something with it this release.
> I'm happy if anyone would beat me to it.
> 
> > 
> > The need for a timeout is usually indicative of a design issue. In this
> > case, the choice of a coroutine for the incoming side is the obvious
> > one. Peter will tell you all about it! =)  
> 
> Nah. :)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-11  7:03             ` Lukas Straub
@ 2025-08-11 13:53               ` Fabiano Rosas
  2025-08-19 10:31                 ` Daniel P. Berrangé
  0 siblings, 1 reply; 20+ messages in thread
From: Fabiano Rosas @ 2025-08-11 13:53 UTC (permalink / raw)
  To: Lukas Straub, Peter Xu; +Cc: Yong Huang, qemu-devel

Lukas Straub <lukasstraub2@web.de> writes:

> On Fri, 8 Aug 2025 11:37:23 -0400
> Peter Xu <peterx@redhat.com> wrote:
>
>> On Fri, Aug 08, 2025 at 10:55:25AM -0300, Fabiano Rosas wrote:
>> > Please work with Lukas to figure out whether yank can be used here. I
>> > think that's the correct approach. If the main loop is blocked, then
>> > some out-of-band cancellation routine is needed. migrate_cancel() could
>> > be it, but at the moment it's not. Yank is the second best thing.  
>> 
>> I agree.
>> 
>> migrate_cancel() should really be an OOB command..  It should be a superset
>> of yank features, plus anything migration speficic besides yanking the
>> channels, for example, when migration thread is blocked in PRE_SWITCHOVER.
>
> Hmm, I think the migration code should handle this properly even if the
> yank command is used. From the POV of migration, it sees that the
> connection broke with connection reset. That is the same error as if the
> other side crashes/is killed or a NAT/stateful firewall in between
> reboots.
>

That should all work just fine. After yank or after a detectable network
failure. The issue here seems to be that the destination recv is hanging
indefinitely. I don't think we ever played with socket timeout
configurations, or even switching to non-blocking during the sync. This
is actually (AFAIK) the first time we get a hang that's not "just" a
synchronization issue in the migration code.

>> 
>> I'll add this into my todo; maybe I can do something with it this release.
>> I'm happy if anyone would beat me to it.
>> 
>> > 
>> > The need for a timeout is usually indicative of a design issue. In this
>> > case, the choice of a coroutine for the incoming side is the obvious
>> > one. Peter will tell you all about it! =)  
>> 
>> Nah. :)
>> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-07  2:41 [PATCH] multifd: Make the main thread yield periodically to the main loop yong.huang
                   ` (3 preceding siblings ...)
  2025-08-08 15:42 ` Peter Xu
@ 2025-08-19 10:19 ` Daniel P. Berrangé
  4 siblings, 0 replies; 20+ messages in thread
From: Daniel P. Berrangé @ 2025-08-19 10:19 UTC (permalink / raw)
  To: yong.huang; +Cc: qemu-devel, Peter Xu, Fabiano Rosas

On Thu, Aug 07, 2025 at 10:41:17AM +0800, yong.huang@smartx.com wrote:
> From: Hyman Huang <yong.huang@smartx.com>
> 
> When there are network issues like missing TCP ACKs on the send
> side during the multifd live migration. At the send side, the error
> "Connection timed out" is thrown out and source QEMU process stop
> sending data, at the receive side, The IO-channels may be blocked
> at recvmsg() and thus the main loop gets stuck and fails to respond
> to QMP commands consequently.

The core contract of the main event loop thread is that *NOTHING*
must ever go into a blocking sleep/wait state, precisely because
this breaks other functionality using the event loop such as QMP.

> The QEMU backtrace at the receive side with the main thread and two
> multi-channel threads is displayed as follows:

snip

> main thread:
> Thread 1 (Thread 0x7fd45f1fbe40 (LWP 1413088)):
> 0  0x00007fd46066b616 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x5556d7604e80) at ../sysdeps/unix/sysv/linux/futex-internal.h:216
> 1  do_futex_wait (sem=sem@entry=0x5556d7604e80, abstime=0x0) at sem_waitcommon.c:111
> 2  0x00007fd46066b708 in __new_sem_wait_slow (sem=sem@entry=0x5556d7604e80, abstime=0x0) at sem_waitcommon.c:183
> 3  0x00007fd46066b779 in __new_sem_wait (sem=sem@entry=0x5556d7604e80) at sem_wait.c:42
> 4  0x00005556d5415524 in qemu_sem_wait (sem=0x5556d7604e80) at ../util/qemu-thread-posix.c:358
> 5  0x00005556d4fa5e99 in multifd_recv_sync_main () at ../migration/multifd.c:1052
> 6  0x00005556d521ed65 in ram_load_precopy (f=f@entry=0x5556d75dfb90) at ../migration/ram.c:4446
> 7  0x00005556d521f1dd in ram_load (f=0x5556d75dfb90, opaque=<optimized out>, version_id=4) at ../migration/ram.c:4495
> 8  0x00005556d4faa3e7 in vmstate_load (f=f@entry=0x5556d75dfb90, se=se@entry=0x5556d6083070) at ../migration/savevm.c:909
> 9  0x00005556d4fae7a0 in qemu_loadvm_section_part_end (mis=0x5556d6082cc0, f=0x5556d75dfb90) at ../migration/savevm.c:2475
> 10 qemu_loadvm_state_main (f=f@entry=0x5556d75dfb90, mis=mis@entry=0x5556d6082cc0) at ../migration/savevm.c:2634
> 11 0x00005556d4fafbd5 in qemu_loadvm_state (f=0x5556d75dfb90) at ../migration/savevm.c:2706
> 12 0x00005556d4f9ebdb in process_incoming_migration_co (opaque=<optimized out>) at ../migration/migration.c:561
> 13 0x00005556d542513b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:186
> 14 0x00007fd4604ef970 in ?? () from target:/lib64/libc.so.6

Here we see the main event thread is running a migration
coroutine, and the migration code has gone into a blocking
sleep via qemu_sem_wait, which is a violation of the main
event thread contract.

> 
> Once the QEMU process falls into the above state in the presence of
> the network errors, live migration cannot be canceled gracefully,
> leaving the destination VM in the "paused" state, since the QEMU
> process on the destination side doesn't respond to the QMP command
> "migrate_cancel".
> 
> To fix that, make the main thread yield to the main loop after waiting
> too long for the multi-channels to finish receiving data during one
> iteration. 10 seconds is a sufficient timeout period to set.
> 
> Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> ---
>  migration/multifd.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index b255778855..aca0aeb341 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1228,6 +1228,16 @@ void multifd_recv_sync_main(void)
>              }
>          }
>          trace_multifd_recv_sync_main_signal(p->id);
> +        do {
> +            if (qemu_sem_timedwait(&multifd_recv_state->sem_sync, 10000) == 0) {
> +                break;
> +            }
> +            if (qemu_in_coroutine()) {
> +                aio_co_schedule(qemu_get_current_aio_context(),
> +                                qemu_coroutine_self());
> +                qemu_coroutine_yield();
> +            }
> +        } while (1);

This tries to workaround the violation of the event loop contract using
short timeouts for the semaphore wait, but IMHO that is just papering
over the design flaw.

The migration code should not be using semaphores at all for sync purposes
if it wants to be running in a coroutine from the event loop thread. It
either needs to use some synchronization mechanism that can be polled by
the event thread in a non-blocking manner, or this code needs to move to
a background thread instead of a coroutine.

>          qemu_sem_post(&p->sem_sync);
>      }
>      trace_multifd_recv_sync_main(multifd_recv_state->packet_num);
> -- 
> 2.27.0
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-11 13:53               ` Fabiano Rosas
@ 2025-08-19 10:31                 ` Daniel P. Berrangé
  2025-08-19 12:03                   ` Lukas Straub
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel P. Berrangé @ 2025-08-19 10:31 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: Lukas Straub, Peter Xu, Yong Huang, qemu-devel

On Mon, Aug 11, 2025 at 10:53:11AM -0300, Fabiano Rosas wrote:
> Lukas Straub <lukasstraub2@web.de> writes:
> 
> > On Fri, 8 Aug 2025 11:37:23 -0400
> > Peter Xu <peterx@redhat.com> wrote:
> >
> >> On Fri, Aug 08, 2025 at 10:55:25AM -0300, Fabiano Rosas wrote:
> >> > Please work with Lukas to figure out whether yank can be used here. I
> >> > think that's the correct approach. If the main loop is blocked, then
> >> > some out-of-band cancellation routine is needed. migrate_cancel() could
> >> > be it, but at the moment it's not. Yank is the second best thing.  
> >> 
> >> I agree.
> >> 
> >> migrate_cancel() should really be an OOB command..  It should be a superset
> >> of yank features, plus anything migration speficic besides yanking the
> >> channels, for example, when migration thread is blocked in PRE_SWITCHOVER.
> >
> > Hmm, I think the migration code should handle this properly even if the
> > yank command is used. From the POV of migration, it sees that the
> > connection broke with connection reset. That is the same error as if the
> > other side crashes/is killed or a NAT/stateful firewall in between
> > reboots.
> >
> 
> That should all work just fine. After yank or after a detectable network
> failure. The issue here seems to be that the destination recv is hanging
> indefinitely. I don't think we ever played with socket timeout
> configurations, or even switching to non-blocking during the sync. This
> is actually (AFAIK) the first time we get a hang that's not "just" a
> synchronization issue in the migration code.

Based on the stack trace, whether the socket is blocking or not isn't a
problem - QEMU is stuck in a  sem_wait call that will delay the coroutine,
and thus the thread, indefinitely. IMHO the semaphore usage needs to be
removed in favour of a synchronization mechanism that can integrate with
event loop such that the coroutine does not block.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-19 10:31                 ` Daniel P. Berrangé
@ 2025-08-19 12:03                   ` Lukas Straub
  2025-08-19 12:07                     ` Daniel P. Berrangé
  0 siblings, 1 reply; 20+ messages in thread
From: Lukas Straub @ 2025-08-19 12:03 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Fabiano Rosas, Peter Xu, Yong Huang, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1943 bytes --]

On Tue, 19 Aug 2025 11:31:03 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Mon, Aug 11, 2025 at 10:53:11AM -0300, Fabiano Rosas wrote:
> > Lukas Straub <lukasstraub2@web.de> writes:
> >   
> > > On Fri, 8 Aug 2025 11:37:23 -0400
> > > Peter Xu <peterx@redhat.com> wrote:
> > >> ...
> > >> migrate_cancel() should really be an OOB command..  It should be a superset
> > >> of yank features, plus anything migration speficic besides yanking the
> > >> channels, for example, when migration thread is blocked in PRE_SWITCHOVER.  
> > >
> > > Hmm, I think the migration code should handle this properly even if the
> > > yank command is used. From the POV of migration, it sees that the
> > > connection broke with connection reset. That is the same error as if the
> > > other side crashes/is killed or a NAT/stateful firewall in between
> > > reboots.
> > >  
> > 
> > That should all work just fine. After yank or after a detectable network
> > failure. The issue here seems to be that the destination recv is hanging
> > indefinitely. I don't think we ever played with socket timeout
> > configurations, or even switching to non-blocking during the sync. This
> > is actually (AFAIK) the first time we get a hang that's not "just" a
> > synchronization issue in the migration code.  
> 
> Based on the stack trace, whether the socket is blocking or not isn't a
> problem - QEMU is stuck in a  sem_wait call that will delay the coroutine,
> and thus the thread, indefinitely. IMHO the semaphore usage needs to be
> removed in favour of a synchronization mechanism that can integrate with
> event loop such that the coroutine does not block.
> 

I don't think that is an issue. The semaphore is just there to sync
with the multifd threads, which are in turn blocking on recvmsg.

Without multifd the main thread would hang in recvmsg as well in this
scenario.

Best Regards,
Lukas Straub

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-19 12:03                   ` Lukas Straub
@ 2025-08-19 12:07                     ` Daniel P. Berrangé
  2025-08-19 20:03                       ` Peter Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel P. Berrangé @ 2025-08-19 12:07 UTC (permalink / raw)
  To: Lukas Straub; +Cc: Fabiano Rosas, Peter Xu, Yong Huang, qemu-devel

On Tue, Aug 19, 2025 at 02:03:26PM +0200, Lukas Straub wrote:
> On Tue, 19 Aug 2025 11:31:03 +0100
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Mon, Aug 11, 2025 at 10:53:11AM -0300, Fabiano Rosas wrote:
> > > Lukas Straub <lukasstraub2@web.de> writes:
> > >   
> > > > On Fri, 8 Aug 2025 11:37:23 -0400
> > > > Peter Xu <peterx@redhat.com> wrote:
> > > >> ...
> > > >> migrate_cancel() should really be an OOB command..  It should be a superset
> > > >> of yank features, plus anything migration speficic besides yanking the
> > > >> channels, for example, when migration thread is blocked in PRE_SWITCHOVER.  
> > > >
> > > > Hmm, I think the migration code should handle this properly even if the
> > > > yank command is used. From the POV of migration, it sees that the
> > > > connection broke with connection reset. That is the same error as if the
> > > > other side crashes/is killed or a NAT/stateful firewall in between
> > > > reboots.
> > > >  
> > > 
> > > That should all work just fine. After yank or after a detectable network
> > > failure. The issue here seems to be that the destination recv is hanging
> > > indefinitely. I don't think we ever played with socket timeout
> > > configurations, or even switching to non-blocking during the sync. This
> > > is actually (AFAIK) the first time we get a hang that's not "just" a
> > > synchronization issue in the migration code.  
> > 
> > Based on the stack trace, whether the socket is blocking or not isn't a
> > problem - QEMU is stuck in a  sem_wait call that will delay the coroutine,
> > and thus the thread, indefinitely. IMHO the semaphore usage needs to be
> > removed in favour of a synchronization mechanism that can integrate with
> > event loop such that the coroutine does not block.
> > 
> 
> I don't think that is an issue. The semaphore is just there to sync
> with the multifd threads, which are in turn blocking on recvmsg.
> 
> Without multifd the main thread would hang in recvmsg as well in this
> scenario.

If it is using blocking I/O that would hang, but that's another thing
that should not be done.  The QIOChannel code supports using non-blocking
sockets in a blocking manner by yielding the coroutine.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
  2025-08-19 12:07                     ` Daniel P. Berrangé
@ 2025-08-19 20:03                       ` Peter Xu
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Xu @ 2025-08-19 20:03 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Lukas Straub, Fabiano Rosas, Yong Huang, qemu-devel

On Tue, Aug 19, 2025 at 01:07:28PM +0100, Daniel P. Berrangé wrote:
> On Tue, Aug 19, 2025 at 02:03:26PM +0200, Lukas Straub wrote:
> > On Tue, 19 Aug 2025 11:31:03 +0100
> > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > 
> > > On Mon, Aug 11, 2025 at 10:53:11AM -0300, Fabiano Rosas wrote:
> > > > Lukas Straub <lukasstraub2@web.de> writes:
> > > >   
> > > > > On Fri, 8 Aug 2025 11:37:23 -0400
> > > > > Peter Xu <peterx@redhat.com> wrote:
> > > > >> ...
> > > > >> migrate_cancel() should really be an OOB command..  It should be a superset
> > > > >> of yank features, plus anything migration speficic besides yanking the
> > > > >> channels, for example, when migration thread is blocked in PRE_SWITCHOVER.  
> > > > >
> > > > > Hmm, I think the migration code should handle this properly even if the
> > > > > yank command is used. From the POV of migration, it sees that the
> > > > > connection broke with connection reset. That is the same error as if the
> > > > > other side crashes/is killed or a NAT/stateful firewall in between
> > > > > reboots.
> > > > >  
> > > > 
> > > > That should all work just fine. After yank or after a detectable network
> > > > failure. The issue here seems to be that the destination recv is hanging
> > > > indefinitely. I don't think we ever played with socket timeout
> > > > configurations, or even switching to non-blocking during the sync. This
> > > > is actually (AFAIK) the first time we get a hang that's not "just" a
> > > > synchronization issue in the migration code.  
> > > 
> > > Based on the stack trace, whether the socket is blocking or not isn't a
> > > problem - QEMU is stuck in a  sem_wait call that will delay the coroutine,
> > > and thus the thread, indefinitely. IMHO the semaphore usage needs to be
> > > removed in favour of a synchronization mechanism that can integrate with
> > > event loop such that the coroutine does not block.
> > > 
> > 
> > I don't think that is an issue. The semaphore is just there to sync
> > with the multifd threads, which are in turn blocking on recvmsg.
> > 
> > Without multifd the main thread would hang in recvmsg as well in this
> > scenario.
> 
> If it is using blocking I/O that would hang, but that's another thing
> that should not be done.  The QIOChannel code supports using non-blocking
> sockets in a blocking manner by yielding the coroutine.

The thing is multifd feature, as a whole, is done with a thread-based
model.  It doesn't have any other coroutines to yield, AFAIU..

Instead, I do want to make the precopy load on dest QEMU also happen in a
separate thread instead of the main thread at some point.

I did try it once but it isn't trivial.  Unlike savevm, there're quite some
assumptions that the bql will be around when loading the VM.  But maybe I
should keep trying that until we figure out all such spots and see whether
we can still move it out at some point.

If that'll work some day, then multifd sync on dest qemu will by default
happen without BQL.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-08-19 20:04 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-07  2:41 [PATCH] multifd: Make the main thread yield periodically to the main loop yong.huang
2025-08-07  9:32 ` Lukas Straub
2025-08-07  9:36 ` Lukas Straub
2025-08-08  2:36   ` Yong Huang
2025-08-08  7:01     ` Lukas Straub
2025-08-08  8:02       ` Yong Huang
2025-08-08 13:55         ` Fabiano Rosas
2025-08-08 15:37           ` Peter Xu
2025-08-11  2:25             ` Yong Huang
2025-08-11  7:03             ` Lukas Straub
2025-08-11 13:53               ` Fabiano Rosas
2025-08-19 10:31                 ` Daniel P. Berrangé
2025-08-19 12:03                   ` Lukas Straub
2025-08-19 12:07                     ` Daniel P. Berrangé
2025-08-19 20:03                       ` Peter Xu
2025-08-11  2:27           ` Yong Huang
2025-08-08  6:36 ` Yong Huang
2025-08-08 15:42 ` Peter Xu
2025-08-11  2:02   ` Yong Huang
2025-08-19 10:19 ` Daniel P. Berrangé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).