All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>, qemu-devel@nongnu.org
Cc: peterx@redhat.com, Juan Quintela <quintela@redhat.com>,
	Xiaohui Li <xiaohli@redhat.com>
Subject: Re: [PATCH v3 08/10] migration: Allow network to fail even during recovery
Date: Thu, 05 Oct 2023 10:25:29 -0300	[thread overview]
Message-ID: <87bkddfbg6.fsf@suse.de> (raw)
In-Reply-To: <20231004220240.167175-9-peterx@redhat.com>

Peter Xu <peterx@redhat.com> writes:

> Normally the postcopy recover phase should only exist for a super short
> period, that's the duration when QEMU is trying to recover from an
> interrupted postcopy migration, during which handshake will be carried out
> for continuing the procedure with state changes from PAUSED -> RECOVER ->
> POSTCOPY_ACTIVE again.
>
> Here RECOVER phase should be super small, that happens right after the
> admin specified a new but working network link for QEMU to reconnect to
> dest QEMU.
>
> However there can still be case where the channel is broken in this small
> RECOVER window.
>
> If it happens, with current code there's no way the src QEMU can got kicked
> out of RECOVER stage. No way either to retry the recover in another channel
> when established.
>
> This patch allows the RECOVER phase to fail itself too - we're mostly
> ready, just some small things missing, e.g. properly kick the main
> migration thread out when sleeping on rp_sem when we found that we're at
> RECOVER stage.  When this happens, it fails the RECOVER itself, and
> rollback to PAUSED stage.  Then the user can retry another round of
> recovery.
>
> To make it even stronger, teach QMP command migrate-pause to explicitly
> kick src/dst QEMU out when needed, so even if for some reason the migration
> thread didn't got kicked out already by a failing rethrn-path thread, the
> admin can also kick it out.
>
> This will be an super, super corner case, but still try to cover that.
>
> One can try to test this with two proxy channels for migration:
>
>   (a) socat unix-listen:/tmp/src.sock,reuseaddr,fork tcp:localhost:10000
>   (b) socat tcp-listen:10000,reuseaddr,fork unix:/tmp/dst.sock
>
> So the migration channel will be:
>
>                       (a)          (b)
>   src -> /tmp/src.sock -> tcp:10000 -> /tmp/dst.sock -> dst
>
> Then to make QEMU hang at RECOVER stage, one can do below:
>
>   (1) stop the postcopy using QMP command postcopy-pause
>   (2) kill the 2nd proxy (b)
>   (3) try to recover the postcopy using /tmp/src.sock on src
>   (4) src QEMU will go into RECOVER stage but won't be able to continue
>       from there, because the channel is actually broken at (b)
>
> Before this patch, step (4) will make src QEMU stuck in RECOVER stage,
> without a way to kick the QEMU out or continue the postcopy again.  After
> this patch, (4) will quickly fail qemu and bounce back to PAUSED stage.
>
> Admin can also kick QEMU from (4) into PAUSED when needed using
> migrate-pause when needed.
>
> After bouncing back to PAUSED stage, one can recover again.
>
> Reported-by: Xiaohui Li <xiaohli@redhat.com>
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111332
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


  reply	other threads:[~2023-10-05 13:27 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-04 22:02 [PATCH v3 00/10] migration: Better error handling in rp thread, allow failures in recover Peter Xu
2023-10-04 22:02 ` [PATCH v3 01/10] migration: Display error in query-migrate irrelevant of status Peter Xu
2023-10-05  7:28   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 02/10] migration: Introduce migrate_has_error() Peter Xu
2023-10-05  7:30   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 03/10] migration: Refactor error handling in source return path Peter Xu
2023-10-05  6:11   ` Philippe Mathieu-Daudé
2023-10-05 16:05     ` Peter Xu
2023-10-08 11:39       ` Philippe Mathieu-Daudé
2023-10-05  8:22   ` Juan Quintela
2023-10-05 19:35     ` Peter Xu
2023-10-05 12:57   ` Fabiano Rosas
2023-10-05 19:35     ` Peter Xu
2023-10-04 22:02 ` [PATCH v3 04/10] migration: Deliver return path file error to migrate state too Peter Xu
2023-10-05  7:32   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 05/10] qemufile: Always return a verbose error Peter Xu
2023-10-05  7:42   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 06/10] migration: Remember num of ramblocks to sync during recovery Peter Xu
2023-10-05  7:43   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 07/10] migration: Add migration_rp_wait|kick() Peter Xu
2023-10-05  7:49   ` Juan Quintela
2023-10-05 20:47     ` Peter Xu
2023-10-04 22:02 ` [PATCH v3 08/10] migration: Allow network to fail even during recovery Peter Xu
2023-10-05 13:25   ` Fabiano Rosas [this message]
2023-10-04 22:02 ` [PATCH v3 09/10] migration: Allow RECOVER->PAUSED convertion for dest qemu Peter Xu
2023-10-05  8:24   ` Juan Quintela
2023-10-04 22:02 ` [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER Peter Xu
2023-10-05 13:24   ` Fabiano Rosas
2023-10-05 13:37     ` Fabiano Rosas
2023-10-05 20:55       ` Peter Xu
2023-10-05 21:10         ` Fabiano Rosas
2023-10-05 21:44           ` Peter Xu
2023-10-05 22:01             ` Fabiano Rosas
2023-10-09 16:50               ` Fabiano Rosas
2023-10-10 16:00                 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bkddfbg6.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=xiaohli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.