From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
Wei Wang <wei.w.wang@intel.com>,
Leonardo Bras <leobras@redhat.com>
Subject: Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source
Date: Wed, 2 Aug 2023 12:19:32 -0400 [thread overview]
Message-ID: <ZMqCFIXONOxUCeSE@x1n> (raw)
In-Reply-To: <20230802143644.7534-2-farosas@suse.de>
On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
> This function currently has a straight-forward part which is waiting
> for the thread to join and a complicated part which is doing a
> qemu_file_shutdown() on the return path file.
>
> The shutdown is tricky because all calls to qemu_file_shutdown() set
> f->last_error to -EIO, which means we can never know if an error is an
> actual error or if we cleanly shutdown the file previously.
>
> This is particularly bothersome for postcopy because it would send the
> return path thread into the retry routine which would wait on the
> postcopy_pause_rp_sem and consequently block the main thread. We
> haven't had reports of this so I must presume we never reach here with
> postcopy.
>
> The shutdown call is also racy because since it doesn't take the
> qemu_file_lock, it could NULL-dereference if the return path thread
> happens to be in the middle of the critical region at
> migration_release_dst_files().
After you rework the thread model on resume, shall we move
migration_release_dst_files() into the migration thread to be after the
pthread_join()? I assume then we don't even need a mutex to protect it?
>
> Move this more complicated part of the code to a separate routine so
> we can wait on the thread without all of this baggage.
I think you mentioned "some nuance" on having mark_source_rp_bad() in
await_return_path_close_on_source(), I did remember I tried to look into
that "nuance" too a long time ago but I just forgot what was that. Great
if you can share some details.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> migration/migration.c | 46 +++++++++++++++++++++++++------------------
> 1 file changed, 27 insertions(+), 19 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 91bba630a8..58f09275a8 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2038,6 +2038,25 @@ static int open_return_path_on_source(MigrationState *ms,
> /* Returns 0 if the RP was ok, otherwise there was an error on the RP */
> static int await_return_path_close_on_source(MigrationState *ms)
> {
> + if (!ms->rp_state.rp_thread_created) {
> + return 0;
> + }
> +
> + trace_await_return_path_close_on_source_joining();
> + qemu_thread_join(&ms->rp_state.rp_thread);
> + ms->rp_state.rp_thread_created = false;
> + trace_await_return_path_close_on_source_close();
> + return ms->rp_state.error;
> +}
> +
> +static int close_return_path_on_source(MigrationState *ms)
> +{
> + int ret;
> +
> + if (!ms->rp_state.rp_thread_created) {
> + return 0;
> + }
Can we still rely on the await_return_path_close_on_source() check, so as
to dedup this one?
> +
> /*
> * If this is a normal exit then the destination will send a SHUT and the
> * rp_thread will exit, however if there's an error we need to cause
> @@ -2051,11 +2070,12 @@ static int await_return_path_close_on_source(MigrationState *ms)
> qemu_file_shutdown(ms->rp_state.from_dst_file);
> mark_source_rp_bad(ms);
> }
> - trace_await_return_path_close_on_source_joining();
> - qemu_thread_join(&ms->rp_state.rp_thread);
> - ms->rp_state.rp_thread_created = false;
> - trace_await_return_path_close_on_source_close();
> - return ms->rp_state.error;
> +
> + trace_migration_return_path_end_before();
> + ret = await_return_path_close_on_source(ms);
> + trace_migration_return_path_end_after(ret);
> +
> + return ret;
> }
>
> static inline void
> @@ -2351,20 +2371,8 @@ static void migration_completion(MigrationState *s)
> goto fail;
> }
>
> - /*
> - * If rp was opened we must clean up the thread before
> - * cleaning everything else up (since if there are no failures
> - * it will wait for the destination to send it's status in
> - * a SHUT command).
> - */
> - if (s->rp_state.rp_thread_created) {
> - int rp_error;
> - trace_migration_return_path_end_before();
> - rp_error = await_return_path_close_on_source(s);
> - trace_migration_return_path_end_after(rp_error);
> - if (rp_error) {
> - goto fail;
> - }
> + if (close_return_path_on_source(s)) {
> + goto fail;
> }
>
> if (qemu_file_get_error(s->to_dst_file)) {
> --
> 2.35.3
>
--
Peter Xu
next prev parent reply other threads:[~2023-08-02 16:20 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-02 14:36 [PATCH v2 0/2] Fix segfault on migration return path Fabiano Rosas
2023-08-02 14:36 ` [PATCH v2 1/2] migration: Split await_return_path_close_on_source Fabiano Rosas
2023-08-02 16:19 ` Peter Xu [this message]
2023-08-02 19:58 ` Fabiano Rosas
2023-08-02 20:40 ` Peter Xu
2023-08-03 14:45 ` Fabiano Rosas
2023-08-03 15:15 ` Peter Xu
2023-08-03 15:24 ` Daniel P. Berrangé
2023-08-03 15:39 ` Peter Xu
2023-08-02 14:36 ` [PATCH v2 2/2] migration: Replace the return path retry logic Fabiano Rosas
2023-08-02 16:02 ` Peter Xu
2023-08-02 20:04 ` Fabiano Rosas
2023-08-02 20:44 ` Peter Xu
2023-08-03 15:00 ` Fabiano Rosas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZMqCFIXONOxUCeSE@x1n \
--to=peterx@redhat.com \
--cc=farosas@suse.de \
--cc=leobras@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=wei.w.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).