From: Juan Quintela <quintela@redhat.com>
To: Li Zhang <lizhang@suse.de>
Cc: qemu-devel@nongnu.org, dgilbert@redhat.com, cfontana@suse.de
Subject: Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever
Date: Fri, 26 Nov 2021 17:33:56 +0100 [thread overview]
Message-ID: <87ee72g9l7.fsf@secure.mitica> (raw)
In-Reply-To: <20211126153154.25424-2-lizhang@suse.de> (Li Zhang's message of "Fri, 26 Nov 2021 16:31:53 +0100")
Li Zhang <lizhang@suse.de> wrote:
> When doing live migration with multifd channels 8, 16 or larger number,
> the guest hangs in the presence of the network errors such as missing TCP ACKs.
>
> At sender's side:
> The main thread is blocked on qemu_thread_join, migration_fd_cleanup
> is called because one thread fails on qio_channel_write_all when
> the network problem happens and other send threads are blocked on sendmsg.
> They could not be terminated. So the main thread is blocked on qemu_thread_join
> to wait for the threads terminated.
>
> (gdb) bt
> 0 0x00007f30c8dcffc0 in __pthread_clockjoin_ex () at /lib64/libpthread.so.0
> 1 0x000055cbb716084b in qemu_thread_join (thread=0x55cbb881f418) at ../util/qemu-thread-posix.c:627
> 2 0x000055cbb6b54e40 in multifd_save_cleanup () at ../migration/multifd.c:542
> 3 0x000055cbb6b4de06 in migrate_fd_cleanup (s=0x55cbb8024000) at ../migration/migration.c:1808
> 4 0x000055cbb6b4dfb4 in migrate_fd_cleanup_bh (opaque=0x55cbb8024000) at ../migration/migration.c:1850
> 5 0x000055cbb7173ac1 in aio_bh_call (bh=0x55cbb7eb98e0) at ../util/async.c:141
> 6 0x000055cbb7173bcb in aio_bh_poll (ctx=0x55cbb7ebba80) at ../util/async.c:169
> 7 0x000055cbb715ba4b in aio_dispatch (ctx=0x55cbb7ebba80) at ../util/aio-posix.c:381
> 8 0x000055cbb7173ffe in aio_ctx_dispatch (source=0x55cbb7ebba80, callback=0x0, user_data=0x0) at ../util/async.c:311
> 9 0x00007f30c9c8cdf4 in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0
> 10 0x000055cbb71851a2 in glib_pollfds_poll () at ../util/main-loop.c:232
> 11 0x000055cbb718521c in os_host_main_loop_wait (timeout=42251070366) at ../util/main-loop.c:255
> 12 0x000055cbb7185321 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
> 13 0x000055cbb6e6ba27 in qemu_main_loop () at ../softmmu/runstate.c:726
> 14 0x000055cbb6ad6fd7 in main (argc=68, argv=0x7ffc0c578888, envp=0x7ffc0c578ab0) at ../softmmu/main.c:50
>
> At receiver's side:
> Several receive threads are not created successfully and the receive threads
> which have been created are blocked on qemu_sem_wait. No semaphores are posted
> because migration is not started if not all the receive threads are created
> successfully and multifd_recv_sync_main is not called which posts the semaphore
> to receive threads. So the receive threads are waiting on the semaphore and
> never return. It shouldn't wait for the semaphore forever.
> Use qemu_sem_timedwait to wait for a while, then return and close the channels.
> So the guest doesn't hang anymore.
>
> (gdb) bt
> 0 0x00007fd61c43f064 in do_futex_wait.constprop () at /lib64/libpthread.so.0
> 1 0x00007fd61c43f158 in __new_sem_wait_slow.constprop.0 () at /lib64/libpthread.so.0
> 2 0x000056075916014a in qemu_sem_wait (sem=0x56075b6515f0) at ../util/qemu-thread-posix.c:358
> 3 0x0000560758b56643 in multifd_recv_thread (opaque=0x56075b651550) at ../migration/multifd.c:1112
> 4 0x0000560759160598 in qemu_thread_start (args=0x56075befad00) at ../util/qemu-thread-posix.c:556
> 5 0x00007fd61c43594a in start_thread () at /lib64/libpthread.so.0
> 6 0x00007fd61c158d0f in clone () at /lib64/libc.so.6
>
> Signed-off-by: Li Zhang <lizhang@suse.de>
> ---
> migration/multifd.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 7c9deb1921..656239ca2a 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1109,7 +1109,7 @@ static void *multifd_recv_thread(void *opaque)
>
> if (flags & MULTIFD_FLAG_SYNC) {
> qemu_sem_post(&multifd_recv_state->sem_sync);
> - qemu_sem_wait(&p->sem_sync);
> + qemu_sem_timedwait(&p->sem_sync, 1000);
> }
> }
Problem happens here, but I think that the solution is not worng. We
are returning from the semaphore without given a single error message.
Later, Juan.
next prev parent reply other threads:[~2021-11-26 16:35 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-26 15:31 [PATCH 0/2] migration: multifd live migration improvement Li Zhang
2021-11-26 15:31 ` [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever Li Zhang
2021-11-26 15:49 ` Daniel P. Berrangé
2021-11-26 16:44 ` Li Zhang
2021-11-26 16:51 ` Daniel P. Berrangé
2021-11-26 17:00 ` Li Zhang
2021-11-26 17:13 ` Daniel P. Berrangé
2021-11-26 17:44 ` Li Zhang
2021-11-29 11:20 ` Dr. David Alan Gilbert
2021-11-29 13:37 ` Li Zhang
2021-11-29 14:50 ` Dr. David Alan Gilbert
2021-11-29 15:34 ` Li Zhang
2021-12-01 12:11 ` Li Zhang
2021-12-01 12:22 ` Daniel P. Berrangé
2021-12-01 13:42 ` Li Zhang
2021-12-01 14:09 ` Daniel P. Berrangé
2021-12-01 14:15 ` Li Zhang
2021-11-29 14:58 ` Daniel P. Berrangé
2021-11-29 15:49 ` Dr. David Alan Gilbert
2021-12-06 9:28 ` Li Zhang
2021-11-26 16:33 ` Juan Quintela [this message]
2021-11-26 16:56 ` Li Zhang
2021-11-26 15:31 ` [PATCH 2/2] migration: Set the socket backlog number to reduce the chance of live migration failure Li Zhang
2021-11-26 16:32 ` Juan Quintela
2021-11-26 16:44 ` Li Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ee72g9l7.fsf@secure.mitica \
--to=quintela@redhat.com \
--cc=cfontana@suse.de \
--cc=dgilbert@redhat.com \
--cc=lizhang@suse.de \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).