qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Lukas Straub <lukasstraub2@web.de>,
	Juan Quintela <quintela@redhat.com>,
	Li Xiaohui <xiaohli@redhat.com>,
	qemu-devel@nongnu.org, Li Xiaohui <xiaohuixiaohli@redhat.com>,
	Leonardo Bras Soares Passos <lsoaresp@redhat.com>
Subject: Re: [PATCH 1/3] migration: Release return path early for paused postcopy
Date: Mon, 12 Jul 2021 18:44:42 +0100	[thread overview]
Message-ID: <YOx/im9h/OJLRQ3N@work-vm> (raw)
In-Reply-To: <20210708190653.252961-2-peterx@redhat.com>

* Peter Xu (peterx@redhat.com) wrote:
> When postcopy pause triggered, we rely on the migration thread to cleanup the
> to_dst_file handle, and the return path thread to cleanup the from_dst_file
> handle (which is stored in the local variable "rp").
> 
> Within the process, from_dst_file cleanup (qemu_fclose) is postponed until it's
> setup again due to a postcopy recovery.
> 
> It used to work before yank was born; after yank is introduced we rely on the
> refcount of IOC to correctly unregister yank function in channel_close().  If
> without the early and on-time release of from_dst_file handle the yank function
> will be leftover during paused postcopy.
> 
> Without this patch, below steps (quoted from Xiaohui) could trigger qemu src
> crash:
> 
>   1.Boot vm on src host
>   2.Boot vm on dst host
>   3.Enable postcopy on src&dst host
>   4.Load stressapptest in vm and set postcopy speed to 50M
>   5.Start migration from src to dst host, change into postcopy mode when migration is active.
>   6.When postcopy is active, down the network card(do migration via this network) on dst host.
>   7.Wait untill postcopy is paused on src&dst host.
>   8.Before up network card, recover migration on dst host, will get error like following.
>   9.Ignore the error of step 8, go on recovering migration on src host:
> 
>   After step 9, qemu on src host will core dump after some seconds:
>   qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
>   1.sh: line 38: 44662 Aborted                 (core dumped)
> 
> Reported-by: Li Xiaohui <xiaohuixiaohli@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

(and I can cleanup the email address problem)

> ---
>  migration/migration.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 5ff7ba9d5c..8786104c9a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2818,12 +2818,12 @@ out:
>               * Maybe there is something we can do: it looks like a
>               * network down issue, and we pause for a recovery.
>               */
> +            qemu_fclose(rp);
> +            ms->rp_state.from_dst_file = NULL;
> +            rp = NULL;
>              if (postcopy_pause_return_path_thread(ms)) {
>                  /* Reload rp, reset the rest */
> -                if (rp != ms->rp_state.from_dst_file) {
> -                    qemu_fclose(rp);
> -                    rp = ms->rp_state.from_dst_file;
> -                }
> +                rp = ms->rp_state.from_dst_file;
>                  ms->rp_state.error = false;
>                  goto retry;
>              }
> -- 
> 2.31.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  parent reply	other threads:[~2021-07-12 17:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-08 19:06 [PATCH 0/3] migration: Three more fixes for postcopy recovery Peter Xu
2021-07-08 19:06 ` [PATCH 1/3] migration: Release return path early for paused postcopy Peter Xu
2021-07-08 19:13   ` Peter Xu
2021-07-12 17:44   ` Dr. David Alan Gilbert [this message]
2021-07-08 19:06 ` [PATCH 2/3] migration: Don't do migrate cleanup if during postcopy resume Peter Xu
2021-07-12 18:33   ` Dr. David Alan Gilbert
2021-07-08 19:06 ` [PATCH 3/3] migration: Clear error at entry of migrate_fd_connect() Peter Xu
2021-07-12 18:40   ` Dr. David Alan Gilbert
2021-07-13 11:04 ` [PATCH 0/3] migration: Three more fixes for postcopy recovery Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YOx/im9h/OJLRQ3N@work-vm \
    --to=dgilbert@redhat.com \
    --cc=lsoaresp@redhat.com \
    --cc=lukasstraub2@web.de \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=xiaohli@redhat.com \
    --cc=xiaohuixiaohli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).