* [PATCH 0/2] migration: Two fixes around yank and postcopy recovery @ 2021-06-29 18:13 Peter Xu 2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu ` (3 more replies) 0 siblings, 4 replies; 8+ messages in thread From: Peter Xu @ 2021-06-29 18:13 UTC (permalink / raw) To: qemu-devel Cc: peterx, Lukas Straub, Leonardo Bras Soares Passos, Dr . David Alan Gilbert, Juan Quintela The 1st patch should fix yank with unregister instance; I think it should also fix the issue that Leonardo used to fix in this patch: https://lore.kernel.org/qemu-devel/20210629050522.147057-1-leobras@redhat.com/ The 2nd patch fixes postcopy recovery cannot retry if e.g. the 1st attempt provided a wrong port address. Note that the multifd zstd test may fail if run migration-test with sudo on master (which seems to be a known issue now), and it'll still fail after these two patches applied, however all running tests keep usual. (Leo: please let me know if this series didn't fix the issue you used to fix) Please review, thanks. Peter Xu (2): migration: Move yank outside qemu_start_incoming_migration() migration: Allow reset of postcopy_recover_triggered when failed migration/migration.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) -- 2.31.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() 2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu @ 2021-06-29 18:13 ` Peter Xu 2021-06-30 15:33 ` Dr. David Alan Gilbert 2021-06-29 18:13 ` [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed Peter Xu ` (2 subsequent siblings) 3 siblings, 1 reply; 8+ messages in thread From: Peter Xu @ 2021-06-29 18:13 UTC (permalink / raw) To: qemu-devel Cc: peterx, Lukas Straub, Leonardo Bras Soares Passos, Dr . David Alan Gilbert, Juan Quintela Starting from commit b5eea99ec2f5c, qmp_migrate_recover() calls unregister before calling qemu_start_incoming_migration(). I believe it wanted to mitigate the next call to yank_register_instance(), but I think that's wrong. Firstly, if during recover, we should keep the yank instance there, not "quickly removing and adding it back". Meanwhile, calling qmp_migrate_recover() twice with b5eea99ec2f5c will directly crash the dest qemu (right now it can't; but it'll start to work right after the next patch) because the 1st call of qmp_migrate_recover() will unregister permanently when the channel failed to establish, then the 2nd call of qmp_migrate_recover() crashes at yank_unregister_instance(). This patch fixes it by moving yank ops out of qemu_start_incoming_migration() into qmp_migrate_incoming. For qmp_migrate_recover(), drop the unregister of yank instance too since we keep it there during the recovery phase. Signed-off-by: Peter Xu <peterx@redhat.com> --- migration/migration.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 4228635d18..1bb03d1eca 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -456,10 +456,6 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp) { const char *p = NULL; - if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) { - return; - } - qapi_event_send_migration(MIGRATION_STATUS_SETUP); if (strstart(uri, "tcp:", &p) || strstart(uri, "unix:", NULL) || @@ -474,7 +470,6 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp) } else if (strstart(uri, "fd:", &p)) { fd_start_incoming_migration(p, errp); } else { - yank_unregister_instance(MIGRATION_YANK_INSTANCE); error_setg(errp, "unknown migration protocol: %s", uri); } } @@ -2083,9 +2078,14 @@ void qmp_migrate_incoming(const char *uri, Error **errp) return; } + if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) { + return; + } + qemu_start_incoming_migration(uri, &local_err); if (local_err) { + yank_unregister_instance(MIGRATION_YANK_INSTANCE); error_propagate(errp, local_err); return; } @@ -2114,7 +2114,6 @@ void qmp_migrate_recover(const char *uri, Error **errp) * only re-setup the migration stream and poke existing migration * to continue using that newly established channel. */ - yank_unregister_instance(MIGRATION_YANK_INSTANCE); qemu_start_incoming_migration(uri, errp); } -- 2.31.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() 2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu @ 2021-06-30 15:33 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 8+ messages in thread From: Dr. David Alan Gilbert @ 2021-06-30 15:33 UTC (permalink / raw) To: Peter Xu Cc: Lukas Straub, Leonardo Bras Soares Passos, qemu-devel, Juan Quintela * Peter Xu (peterx@redhat.com) wrote: > Starting from commit b5eea99ec2f5c, qmp_migrate_recover() calls unregister > before calling qemu_start_incoming_migration(). I believe it wanted to mitigate > the next call to yank_register_instance(), but I think that's wrong. > > Firstly, if during recover, we should keep the yank instance there, not > "quickly removing and adding it back". > > Meanwhile, calling qmp_migrate_recover() twice with b5eea99ec2f5c will directly > crash the dest qemu (right now it can't; but it'll start to work right after > the next patch) because the 1st call of qmp_migrate_recover() will unregister > permanently when the channel failed to establish, then the 2nd call of > qmp_migrate_recover() crashes at yank_unregister_instance(). > > This patch fixes it by moving yank ops out of qemu_start_incoming_migration() > into qmp_migrate_incoming. For qmp_migrate_recover(), drop the unregister of > yank instance too since we keep it there during the recovery phase. > > Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/migration.c | 11 +++++------ > 1 file changed, 5 insertions(+), 6 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 4228635d18..1bb03d1eca 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -456,10 +456,6 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp) > { > const char *p = NULL; > > - if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) { > - return; > - } > - > qapi_event_send_migration(MIGRATION_STATUS_SETUP); > if (strstart(uri, "tcp:", &p) || > strstart(uri, "unix:", NULL) || > @@ -474,7 +470,6 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp) > } else if (strstart(uri, "fd:", &p)) { > fd_start_incoming_migration(p, errp); > } else { > - yank_unregister_instance(MIGRATION_YANK_INSTANCE); > error_setg(errp, "unknown migration protocol: %s", uri); > } > } > @@ -2083,9 +2078,14 @@ void qmp_migrate_incoming(const char *uri, Error **errp) > return; > } > > + if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) { > + return; > + } > + > qemu_start_incoming_migration(uri, &local_err); > > if (local_err) { > + yank_unregister_instance(MIGRATION_YANK_INSTANCE); > error_propagate(errp, local_err); > return; > } > @@ -2114,7 +2114,6 @@ void qmp_migrate_recover(const char *uri, Error **errp) > * only re-setup the migration stream and poke existing migration > * to continue using that newly established channel. > */ > - yank_unregister_instance(MIGRATION_YANK_INSTANCE); > qemu_start_incoming_migration(uri, errp); > } > > -- > 2.31.1 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed 2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu 2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu @ 2021-06-29 18:13 ` Peter Xu 2021-06-30 15:39 ` Dr. David Alan Gilbert 2021-06-29 19:00 ` [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu 2021-06-29 22:38 ` Leonardo Bras Soares Passos 3 siblings, 1 reply; 8+ messages in thread From: Peter Xu @ 2021-06-29 18:13 UTC (permalink / raw) To: qemu-devel Cc: peterx, Lukas Straub, Leonardo Bras Soares Passos, Dr . David Alan Gilbert, Juan Quintela It's possible qemu_start_incoming_migration() failed at any point, when it happens we should reset postcopy_recover_triggered to false so that the user can still retry with a saner incoming port. Signed-off-by: Peter Xu <peterx@redhat.com> --- migration/migration.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 1bb03d1eca..fcca289ef7 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2097,6 +2097,13 @@ void qmp_migrate_recover(const char *uri, Error **errp) { MigrationIncomingState *mis = migration_incoming_get_current(); + /* + * Don't even bother to use ERRP_GUARD() as it _must_ always be set by + * callers (no one should ignore a recover failure); if there is, it's a + * programming error. + */ + assert(errp); + if (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) { error_setg(errp, "Migrate recover can only be run " "when postcopy is paused."); @@ -2115,6 +2122,12 @@ void qmp_migrate_recover(const char *uri, Error **errp) * to continue using that newly established channel. */ qemu_start_incoming_migration(uri, errp); + + /* Safe to dereference with the assert above */ + if (*errp) { + /* Reset the flag so user could still retry */ + qatomic_set(&mis->postcopy_recover_triggered, false); + } } void qmp_migrate_pause(Error **errp) -- 2.31.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed 2021-06-29 18:13 ` [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed Peter Xu @ 2021-06-30 15:39 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 8+ messages in thread From: Dr. David Alan Gilbert @ 2021-06-30 15:39 UTC (permalink / raw) To: Peter Xu Cc: Lukas Straub, Leonardo Bras Soares Passos, qemu-devel, Juan Quintela * Peter Xu (peterx@redhat.com) wrote: > It's possible qemu_start_incoming_migration() failed at any point, when it > happens we should reset postcopy_recover_triggered to false so that the user > can still retry with a saner incoming port. > > Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/migration.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/migration/migration.c b/migration/migration.c > index 1bb03d1eca..fcca289ef7 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -2097,6 +2097,13 @@ void qmp_migrate_recover(const char *uri, Error **errp) > { > MigrationIncomingState *mis = migration_incoming_get_current(); > > + /* > + * Don't even bother to use ERRP_GUARD() as it _must_ always be set by > + * callers (no one should ignore a recover failure); if there is, it's a > + * programming error. > + */ > + assert(errp); > + > if (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) { > error_setg(errp, "Migrate recover can only be run " > "when postcopy is paused."); > @@ -2115,6 +2122,12 @@ void qmp_migrate_recover(const char *uri, Error **errp) > * to continue using that newly established channel. > */ > qemu_start_incoming_migration(uri, errp); > + > + /* Safe to dereference with the assert above */ > + if (*errp) { > + /* Reset the flag so user could still retry */ > + qatomic_set(&mis->postcopy_recover_triggered, false); > + } > } > > void qmp_migrate_pause(Error **errp) > -- > 2.31.1 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] migration: Two fixes around yank and postcopy recovery 2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu 2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu 2021-06-29 18:13 ` [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed Peter Xu @ 2021-06-29 19:00 ` Peter Xu 2021-06-29 22:38 ` Leonardo Bras Soares Passos 3 siblings, 0 replies; 8+ messages in thread From: Peter Xu @ 2021-06-29 19:00 UTC (permalink / raw) To: qemu-devel Cc: Lukas Straub, Leonardo Bras Soares Passos, Dr . David Alan Gilbert, Juan Quintela On Tue, Jun 29, 2021 at 02:13:54PM -0400, Peter Xu wrote: > Note that the multifd zstd test may fail if run migration-test with sudo on > master (which seems to be a known issue now), and it'll still fail after these > two patches applied, however all running tests keep usual. There's an unexpected accident; please ignore this paragraph as zstd test actually passes with/without the patchset applied. -- Peter Xu ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] migration: Two fixes around yank and postcopy recovery 2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu ` (2 preceding siblings ...) 2021-06-29 19:00 ` [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu @ 2021-06-29 22:38 ` Leonardo Bras Soares Passos 2021-06-29 23:52 ` Peter Xu 3 siblings, 1 reply; 8+ messages in thread From: Leonardo Bras Soares Passos @ 2021-06-29 22:38 UTC (permalink / raw) To: Peter Xu; +Cc: Lukas Straub, qemu-devel, Dr . David Alan Gilbert, Juan Quintela On Tue, Jun 29, 2021 at 3:14 PM Peter Xu <peterx@redhat.com> wrote: > > The 1st patch should fix yank with unregister instance; I think it should also > fix the issue that Leonardo used to fix in this patch: > > https://lore.kernel.org/qemu-devel/20210629050522.147057-1-leobras@redhat.com/ > > The 2nd patch fixes postcopy recovery cannot retry if e.g. the 1st attempt > provided a wrong port address. > > Note that the multifd zstd test may fail if run migration-test with sudo on > master (which seems to be a known issue now), and it'll still fail after these > two patches applied, however all running tests keep usual. > > (Leo: please let me know if this series didn't fix the issue you used to fix) It does fix the issue, as far as I tested. > > Please review, thanks. > > Peter Xu (2): > migration: Move yank outside qemu_start_incoming_migration() > migration: Allow reset of postcopy_recover_triggered when failed > > migration/migration.c | 24 ++++++++++++++++++------ > 1 file changed, 18 insertions(+), 6 deletions(-) > > -- > 2.31.1 > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] migration: Two fixes around yank and postcopy recovery 2021-06-29 22:38 ` Leonardo Bras Soares Passos @ 2021-06-29 23:52 ` Peter Xu 0 siblings, 0 replies; 8+ messages in thread From: Peter Xu @ 2021-06-29 23:52 UTC (permalink / raw) To: Leonardo Bras Soares Passos Cc: Lukas Straub, qemu-devel, Dr . David Alan Gilbert, Juan Quintela On Tue, Jun 29, 2021 at 07:38:32PM -0300, Leonardo Bras Soares Passos wrote: > > (Leo: please let me know if this series didn't fix the issue you used to fix) > > It does fix the issue, as far as I tested. Thanks, Leo! -- Peter Xu ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-06-30 15:54 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-06-29 18:13 [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu 2021-06-29 18:13 ` [PATCH 1/2] migration: Move yank outside qemu_start_incoming_migration() Peter Xu 2021-06-30 15:33 ` Dr. David Alan Gilbert 2021-06-29 18:13 ` [PATCH 2/2] migration: Allow reset of postcopy_recover_triggered when failed Peter Xu 2021-06-30 15:39 ` Dr. David Alan Gilbert 2021-06-29 19:00 ` [PATCH 0/2] migration: Two fixes around yank and postcopy recovery Peter Xu 2021-06-29 22:38 ` Leonardo Bras Soares Passos 2021-06-29 23:52 ` Peter Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).