From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Leonardo Bras <leobras@redhat.com>
Cc: Li Xiaohui <xiaohli@redhat.com>,
Lukas Straub <lukasstraub2@web.de>,
qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH 1/1] migration: Terminate multifd threads on yank
Date: Mon, 2 Aug 2021 16:35:20 +0100 [thread overview]
Message-ID: <YQgQuCdc8jBKRyLc@work-vm> (raw)
In-Reply-To: <20210730074043.54260-1-leobras@redhat.com>
* Leonardo Bras (leobras@redhat.com) wrote:
> From source host viewpoint, losing a connection during migration will
> cause the sockets to get stuck in sendmsg() syscall, waiting for
> the receiving side to reply.
>
> In migration, yank works by shutting-down the migration QIOChannel fd.
> This causes a failure in the next sendmsg() for that fd, and the whole
> migration gets cancelled.
>
> In multifd, due to having multiple sockets in multiple threads,
> on a connection loss there will be extra sockets stuck in sendmsg(),
> and because they will be holding their own mutex, there is good chance
> the main migration thread can get stuck in multifd_send_pages()
> waiting for one of those mutexes.
>
> While it's waiting, the main migration thread can't run sendmsg() on
> it's fd, and therefore can't cause the migration to be cancelled, thus
> causing yank not to work.
>
> Fixes this by shutting down all migration fds (including multifd ones),
> so no thread get's stuck in sendmsg() while holding a lock, and thus
> allowing the main migration thread to properly cancel migration when
> yank is used.
>
> There is no need to do the same procedure to yank to work in the
> receiving host since ops->recv_pages() is kept outside the mutex protected
> code in multifd_recv_thread().
>
> Buglink:https://bugzilla.redhat.com/show_bug.cgi?id=1970337
> Reported-by: Li Xiaohui <xiaohli@redhat.com>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
> migration/multifd.c | 11 +++++++++++
> migration/multifd.h | 1 +
> migration/yank_functions.c | 2 ++
> 3 files changed, 14 insertions(+)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 377da78f5b..744a180dfe 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1040,6 +1040,17 @@ void multifd_recv_sync_main(void)
> trace_multifd_recv_sync_main(multifd_recv_state->packet_num);
> }
>
> +void multifd_shutdown(void)
> +{
> + if (!migrate_use_multifd()) {
> + return;
> + }
> +
> + if (multifd_send_state) {
> + multifd_send_terminate_threads(NULL);
> + }
That calls :
for (i = 0; i < migrate_multifd_channels(); i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
qemu_mutex_lock(&p->mutex);
p->quit = true;
qemu_sem_post(&p->sem);
qemu_mutex_unlock(&p->mutex);
}
so why doesn't this also get stuck in the same mutex you're trying to
fix?
Does the qio_channel_shutdown actually cause a shutdown on all fd's
for the multifd?
(I've just seen the multifd/cancel test fail stuck in multifd_send_sync_main
waiting on one of the locks).
Dave
> +}
> +
> static void *multifd_recv_thread(void *opaque)
> {
> MultiFDRecvParams *p = opaque;
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 8d6751f5ed..0517213bdf 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -22,6 +22,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
> void multifd_recv_sync_main(void);
> void multifd_send_sync_main(QEMUFile *f);
> int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
> +void multifd_shutdown(void);
>
> /* Multifd Compression flags */
> #define MULTIFD_FLAG_SYNC (1 << 0)
> diff --git a/migration/yank_functions.c b/migration/yank_functions.c
> index 8c08aef14a..9335a64f00 100644
> --- a/migration/yank_functions.c
> +++ b/migration/yank_functions.c
> @@ -15,12 +15,14 @@
> #include "io/channel-socket.h"
> #include "io/channel-tls.h"
> #include "qemu-file.h"
> +#include "multifd.h"
>
> void migration_yank_iochannel(void *opaque)
> {
> QIOChannel *ioc = QIO_CHANNEL(opaque);
>
> qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> + multifd_shutdown();
> }
>
> /* Return whether yank is supported on this ioc */
> --
> 2.32.0
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2021-08-02 15:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-30 7:40 [PATCH 1/1] migration: Terminate multifd threads on yank Leonardo Bras
2021-08-02 15:35 ` Dr. David Alan Gilbert [this message]
2021-08-03 7:02 ` Leonardo Bras Soares Passos
2021-08-03 6:41 ` Lukas Straub
2021-08-03 7:18 ` Leonardo Bras Soares Passos
2021-08-03 8:25 ` Lukas Straub
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YQgQuCdc8jBKRyLc@work-vm \
--to=dgilbert@redhat.com \
--cc=leobras@redhat.com \
--cc=lukasstraub2@web.de \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=xiaohli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.