qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
	Wei Wang <wei.w.wang@intel.com>,
	Leonardo Bras <leobras@redhat.com>
Subject: Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source
Date: Wed, 2 Aug 2023 12:19:32 -0400	[thread overview]
Message-ID: <ZMqCFIXONOxUCeSE@x1n> (raw)
In-Reply-To: <20230802143644.7534-2-farosas@suse.de>

On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
> This function currently has a straight-forward part which is waiting
> for the thread to join and a complicated part which is doing a
> qemu_file_shutdown() on the return path file.
> 
> The shutdown is tricky because all calls to qemu_file_shutdown() set
> f->last_error to -EIO, which means we can never know if an error is an
> actual error or if we cleanly shutdown the file previously.
> 
> This is particularly bothersome for postcopy because it would send the
> return path thread into the retry routine which would wait on the
> postcopy_pause_rp_sem and consequently block the main thread. We
> haven't had reports of this so I must presume we never reach here with
> postcopy.
> 
> The shutdown call is also racy because since it doesn't take the
> qemu_file_lock, it could NULL-dereference if the return path thread
> happens to be in the middle of the critical region at
> migration_release_dst_files().

After you rework the thread model on resume, shall we move
migration_release_dst_files() into the migration thread to be after the
pthread_join()?  I assume then we don't even need a mutex to protect it?

> 
> Move this more complicated part of the code to a separate routine so
> we can wait on the thread without all of this baggage.

I think you mentioned "some nuance" on having mark_source_rp_bad() in
await_return_path_close_on_source(), I did remember I tried to look into
that "nuance" too a long time ago but I just forgot what was that.  Great
if you can share some details.

> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/migration.c | 46 +++++++++++++++++++++++++------------------
>  1 file changed, 27 insertions(+), 19 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 91bba630a8..58f09275a8 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2038,6 +2038,25 @@ static int open_return_path_on_source(MigrationState *ms,
>  /* Returns 0 if the RP was ok, otherwise there was an error on the RP */
>  static int await_return_path_close_on_source(MigrationState *ms)
>  {
> +    if (!ms->rp_state.rp_thread_created) {
> +        return 0;
> +    }
> +
> +    trace_await_return_path_close_on_source_joining();
> +    qemu_thread_join(&ms->rp_state.rp_thread);
> +    ms->rp_state.rp_thread_created = false;
> +    trace_await_return_path_close_on_source_close();
> +    return ms->rp_state.error;
> +}
> +
> +static int close_return_path_on_source(MigrationState *ms)
> +{
> +    int ret;
> +
> +    if (!ms->rp_state.rp_thread_created) {
> +        return 0;
> +    }

Can we still rely on the await_return_path_close_on_source() check, so as
to dedup this one?

> +
>      /*
>       * If this is a normal exit then the destination will send a SHUT and the
>       * rp_thread will exit, however if there's an error we need to cause
> @@ -2051,11 +2070,12 @@ static int await_return_path_close_on_source(MigrationState *ms)
>          qemu_file_shutdown(ms->rp_state.from_dst_file);
>          mark_source_rp_bad(ms);
>      }
> -    trace_await_return_path_close_on_source_joining();
> -    qemu_thread_join(&ms->rp_state.rp_thread);
> -    ms->rp_state.rp_thread_created = false;
> -    trace_await_return_path_close_on_source_close();
> -    return ms->rp_state.error;
> +
> +    trace_migration_return_path_end_before();
> +    ret = await_return_path_close_on_source(ms);
> +    trace_migration_return_path_end_after(ret);
> +
> +    return ret;
>  }
>  
>  static inline void
> @@ -2351,20 +2371,8 @@ static void migration_completion(MigrationState *s)
>          goto fail;
>      }
>  
> -    /*
> -     * If rp was opened we must clean up the thread before
> -     * cleaning everything else up (since if there are no failures
> -     * it will wait for the destination to send it's status in
> -     * a SHUT command).
> -     */
> -    if (s->rp_state.rp_thread_created) {
> -        int rp_error;
> -        trace_migration_return_path_end_before();
> -        rp_error = await_return_path_close_on_source(s);
> -        trace_migration_return_path_end_after(rp_error);
> -        if (rp_error) {
> -            goto fail;
> -        }
> +    if (close_return_path_on_source(s)) {
> +        goto fail;
>      }
>  
>      if (qemu_file_get_error(s->to_dst_file)) {
> -- 
> 2.35.3
> 

-- 
Peter Xu



  reply	other threads:[~2023-08-02 16:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-02 14:36 [PATCH v2 0/2] Fix segfault on migration return path Fabiano Rosas
2023-08-02 14:36 ` [PATCH v2 1/2] migration: Split await_return_path_close_on_source Fabiano Rosas
2023-08-02 16:19   ` Peter Xu [this message]
2023-08-02 19:58     ` Fabiano Rosas
2023-08-02 20:40       ` Peter Xu
2023-08-03 14:45         ` Fabiano Rosas
2023-08-03 15:15           ` Peter Xu
2023-08-03 15:24             ` Daniel P. Berrangé
2023-08-03 15:39               ` Peter Xu
2023-08-02 14:36 ` [PATCH v2 2/2] migration: Replace the return path retry logic Fabiano Rosas
2023-08-02 16:02   ` Peter Xu
2023-08-02 20:04     ` Fabiano Rosas
2023-08-02 20:44       ` Peter Xu
2023-08-03 15:00         ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZMqCFIXONOxUCeSE@x1n \
    --to=peterx@redhat.com \
    --cc=farosas@suse.de \
    --cc=leobras@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=wei.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).