From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Leonardo Bras <leobras@redhat.com>
Cc: Li Xiaohui <xiaohli@redhat.com>,
Lukas Straub <lukasstraub2@web.de>,
qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH 1/1] migration: Terminate multifd threads on yank
Date: Mon, 2 Aug 2021 16:35:20 +0100 [thread overview]
Message-ID: <YQgQuCdc8jBKRyLc@work-vm> (raw)
In-Reply-To: <20210730074043.54260-1-leobras@redhat.com>
* Leonardo Bras (leobras@redhat.com) wrote:
> From source host viewpoint, losing a connection during migration will
> cause the sockets to get stuck in sendmsg() syscall, waiting for
> the receiving side to reply.
>
> In migration, yank works by shutting-down the migration QIOChannel fd.
> This causes a failure in the next sendmsg() for that fd, and the whole
> migration gets cancelled.
>
> In multifd, due to having multiple sockets in multiple threads,
> on a connection loss there will be extra sockets stuck in sendmsg(),
> and because they will be holding their own mutex, there is good chance
> the main migration thread can get stuck in multifd_send_pages()
> waiting for one of those mutexes.
>
> While it's waiting, the main migration thread can't run sendmsg() on
> it's fd, and therefore can't cause the migration to be cancelled, thus
> causing yank not to work.
>
> Fixes this by shutting down all migration fds (including multifd ones),
> so no thread get's stuck in sendmsg() while holding a lock, and thus
> allowing the main migration thread to properly cancel migration when
> yank is used.
>
> There is no need to do the same procedure to yank to work in the
> receiving host since ops->recv_pages() is kept outside the mutex protected
> code in multifd_recv_thread().
>
> Buglink:https://bugzilla.redhat.com/show_bug.cgi?id=1970337
> Reported-by: Li Xiaohui <xiaohli@redhat.com>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
> migration/multifd.c | 11 +++++++++++
> migration/multifd.h | 1 +
> migration/yank_functions.c | 2 ++
> 3 files changed, 14 insertions(+)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 377da78f5b..744a180dfe 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1040,6 +1040,17 @@ void multifd_recv_sync_main(void)
> trace_multifd_recv_sync_main(multifd_recv_state->packet_num);
> }
>
> +void multifd_shutdown(void)
> +{
> + if (!migrate_use_multifd()) {
> + return;
> + }
> +
> + if (multifd_send_state) {
> + multifd_send_terminate_threads(NULL);
> + }
That calls :
for (i = 0; i < migrate_multifd_channels(); i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
qemu_mutex_lock(&p->mutex);
p->quit = true;
qemu_sem_post(&p->sem);
qemu_mutex_unlock(&p->mutex);
}
so why doesn't this also get stuck in the same mutex you're trying to
fix?
Does the qio_channel_shutdown actually cause a shutdown on all fd's
for the multifd?
(I've just seen the multifd/cancel test fail stuck in multifd_send_sync_main
waiting on one of the locks).
Dave
> +}
> +
> static void *multifd_recv_thread(void *opaque)
> {
> MultiFDRecvParams *p = opaque;
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 8d6751f5ed..0517213bdf 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -22,6 +22,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
> void multifd_recv_sync_main(void);
> void multifd_send_sync_main(QEMUFile *f);
> int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
> +void multifd_shutdown(void);
>
> /* Multifd Compression flags */
> #define MULTIFD_FLAG_SYNC (1 << 0)
> diff --git a/migration/yank_functions.c b/migration/yank_functions.c
> index 8c08aef14a..9335a64f00 100644
> --- a/migration/yank_functions.c
> +++ b/migration/yank_functions.c
> @@ -15,12 +15,14 @@
> #include "io/channel-socket.h"
> #include "io/channel-tls.h"
> #include "qemu-file.h"
> +#include "multifd.h"
>
> void migration_yank_iochannel(void *opaque)
> {
> QIOChannel *ioc = QIO_CHANNEL(opaque);
>
> qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> + multifd_shutdown();
> }
>
> /* Return whether yank is supported on this ioc */
> --
> 2.32.0
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2021-08-02 15:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-30 7:40 [PATCH 1/1] migration: Terminate multifd threads on yank Leonardo Bras
2021-08-02 15:35 ` Dr. David Alan Gilbert [this message]
2021-08-03 7:02 ` Leonardo Bras Soares Passos
2021-08-03 6:41 ` Lukas Straub
2021-08-03 7:18 ` Leonardo Bras Soares Passos
2021-08-03 8:25 ` Lukas Straub
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YQgQuCdc8jBKRyLc@work-vm \
--to=dgilbert@redhat.com \
--cc=leobras@redhat.com \
--cc=lukasstraub2@web.de \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=xiaohli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).