* [PATCH] migration: Fix state transition in postcopy_start() error handling
@ 2025-08-26 11:51 Juraj Marcin
2025-08-26 18:23 ` Peter Xu
2025-09-27 14:01 ` Michael Tokarev
0 siblings, 2 replies; 5+ messages in thread
From: Juraj Marcin @ 2025-08-26 11:51 UTC (permalink / raw)
To: qemu-devel; +Cc: Juraj Marcin, Fabiano Rosas, Peter Xu, qemu-stable
From: Juraj Marcin <jmarcin@redhat.com>
Commit 48814111366b ("migration: Always set DEVICE state") introduced
DEVICE state to postcopy, which moved the actual state transition that
leads to POSTCOPY_ACTIVE.
However, the error handling part of the postcopy_start() function still
expects the state POSTCOPY_ACTIVE, but depending on where an error
happens, now the state can be either ACTIVE, DEVICE or CANCELLING, but
never POSTCOPY_ACTIVE, as this transition now happens just before a
successful return from the function.
Instead, accept any state except CANCELLING when transitioning to FAILED
state.
Cc: qemu-stable@nongnu.org
Fixes: 48814111366b ("migration: Always set DEVICE state")
Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
---
In the RFC[1] where this patch was discussed, there was also a
suggestion for a helper function migrate_set_failure() that would check
if the state is not CANCELLING and then set migration error and FAILED
state. I discussed the implementation with Peter, and we came to a
conclusion that instead of patching such clean-up on top of the current
error handling code, it might be more useful to do a larger refactor and
clean-up of all error handling in the migration code.
Such clean-up should reduce the number of places where we need to
explicitly transition to a FAILED state (ideally to one, or only a
couple of places), and instead only set an appropriate migration error
using migrate_set_error(). Additionally, it would also refactor
inappropriate uses of QEMUFile errors where the error is not really an
error of the underlying channel and migrate_set_error() should be used
instead.
[1]: https://lore.kernel.org/all/20250807114922.1013286-3-jmarcin@redhat.com/
---
migration/migration.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 10c216d25d..32b8ce5613 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2872,8 +2872,9 @@ static int postcopy_start(MigrationState *ms, Error **errp)
fail_closefb:
qemu_fclose(fb);
fail:
- migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
- MIGRATION_STATUS_FAILED);
+ if (ms->state != MIGRATION_STATUS_CANCELLING) {
+ migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
+ }
migration_block_activate(NULL);
migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
bql_unlock();
--
2.50.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] migration: Fix state transition in postcopy_start() error handling
2025-08-26 11:51 [PATCH] migration: Fix state transition in postcopy_start() error handling Juraj Marcin
@ 2025-08-26 18:23 ` Peter Xu
2025-08-26 19:00 ` Fabiano Rosas
2025-09-27 14:01 ` Michael Tokarev
1 sibling, 1 reply; 5+ messages in thread
From: Peter Xu @ 2025-08-26 18:23 UTC (permalink / raw)
To: Juraj Marcin; +Cc: qemu-devel, Fabiano Rosas, qemu-stable
On Tue, Aug 26, 2025 at 01:51:40PM +0200, Juraj Marcin wrote:
> From: Juraj Marcin <jmarcin@redhat.com>
>
> Commit 48814111366b ("migration: Always set DEVICE state") introduced
> DEVICE state to postcopy, which moved the actual state transition that
> leads to POSTCOPY_ACTIVE.
>
> However, the error handling part of the postcopy_start() function still
> expects the state POSTCOPY_ACTIVE, but depending on where an error
> happens, now the state can be either ACTIVE, DEVICE or CANCELLING, but
> never POSTCOPY_ACTIVE, as this transition now happens just before a
> successful return from the function.
>
> Instead, accept any state except CANCELLING when transitioning to FAILED
> state.
>
> Cc: qemu-stable@nongnu.org
> Fixes: 48814111366b ("migration: Always set DEVICE state")
> Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Thanks, Juraj!
Reviewed-by: Peter Xu <peterx@redhat.com>
>
> ---
> In the RFC[1] where this patch was discussed, there was also a
> suggestion for a helper function migrate_set_failure() that would check
> if the state is not CANCELLING and then set migration error and FAILED
> state. I discussed the implementation with Peter, and we came to a
> conclusion that instead of patching such clean-up on top of the current
> error handling code, it might be more useful to do a larger refactor and
> clean-up of all error handling in the migration code.
>
> Such clean-up should reduce the number of places where we need to
> explicitly transition to a FAILED state (ideally to one, or only a
> couple of places), and instead only set an appropriate migration error
> using migrate_set_error(). Additionally, it would also refactor
> inappropriate uses of QEMUFile errors where the error is not really an
> error of the underlying channel and migrate_set_error() should be used
> instead.
Fabiano: we discussed something around the FAILED status before as well.
If you started working on something in this area, please shoot!
>
> [1]: https://lore.kernel.org/all/20250807114922.1013286-3-jmarcin@redhat.com/
> ---
> migration/migration.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 10c216d25d..32b8ce5613 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2872,8 +2872,9 @@ static int postcopy_start(MigrationState *ms, Error **errp)
> fail_closefb:
> qemu_fclose(fb);
> fail:
> - migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> - MIGRATION_STATUS_FAILED);
> + if (ms->state != MIGRATION_STATUS_CANCELLING) {
> + migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
> + }
> migration_block_activate(NULL);
> migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
> bql_unlock();
> --
> 2.50.1
>
--
Peter Xu
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] migration: Fix state transition in postcopy_start() error handling
2025-08-26 18:23 ` Peter Xu
@ 2025-08-26 19:00 ` Fabiano Rosas
0 siblings, 0 replies; 5+ messages in thread
From: Fabiano Rosas @ 2025-08-26 19:00 UTC (permalink / raw)
To: Peter Xu, Juraj Marcin; +Cc: qemu-devel, qemu-stable
Peter Xu <peterx@redhat.com> writes:
> On Tue, Aug 26, 2025 at 01:51:40PM +0200, Juraj Marcin wrote:
>> From: Juraj Marcin <jmarcin@redhat.com>
>>
>> Commit 48814111366b ("migration: Always set DEVICE state") introduced
>> DEVICE state to postcopy, which moved the actual state transition that
>> leads to POSTCOPY_ACTIVE.
>>
>> However, the error handling part of the postcopy_start() function still
>> expects the state POSTCOPY_ACTIVE, but depending on where an error
>> happens, now the state can be either ACTIVE, DEVICE or CANCELLING, but
>> never POSTCOPY_ACTIVE, as this transition now happens just before a
>> successful return from the function.
>>
>> Instead, accept any state except CANCELLING when transitioning to FAILED
>> state.
>>
>> Cc: qemu-stable@nongnu.org
>> Fixes: 48814111366b ("migration: Always set DEVICE state")
>> Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
>
> Thanks, Juraj!
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
>>
>> ---
>> In the RFC[1] where this patch was discussed, there was also a
>> suggestion for a helper function migrate_set_failure() that would check
>> if the state is not CANCELLING and then set migration error and FAILED
>> state. I discussed the implementation with Peter, and we came to a
>> conclusion that instead of patching such clean-up on top of the current
>> error handling code, it might be more useful to do a larger refactor and
>> clean-up of all error handling in the migration code.
>>
>> Such clean-up should reduce the number of places where we need to
>> explicitly transition to a FAILED state (ideally to one, or only a
>> couple of places), and instead only set an appropriate migration error
>> using migrate_set_error(). Additionally, it would also refactor
>> inappropriate uses of QEMUFile errors where the error is not really an
>> error of the underlying channel and migrate_set_error() should be used
>> instead.
>
> Fabiano: we discussed something around the FAILED status before as well.
> If you started working on something in this area, please shoot!
>
I don't have anything planned, it's just the thread that I already
linked in the previous version of this patch. Juraj is aware.
>>
>> [1]: https://lore.kernel.org/all/20250807114922.1013286-3-jmarcin@redhat.com/
>> ---
>> migration/migration.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 10c216d25d..32b8ce5613 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2872,8 +2872,9 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>> fail_closefb:
>> qemu_fclose(fb);
>> fail:
>> - migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>> - MIGRATION_STATUS_FAILED);
>> + if (ms->state != MIGRATION_STATUS_CANCELLING) {
>> + migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
>> + }
>> migration_block_activate(NULL);
>> migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
>> bql_unlock();
>> --
>> 2.50.1
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] migration: Fix state transition in postcopy_start() error handling
2025-08-26 11:51 [PATCH] migration: Fix state transition in postcopy_start() error handling Juraj Marcin
2025-08-26 18:23 ` Peter Xu
@ 2025-09-27 14:01 ` Michael Tokarev
2025-09-29 15:47 ` Peter Xu
1 sibling, 1 reply; 5+ messages in thread
From: Michael Tokarev @ 2025-09-27 14:01 UTC (permalink / raw)
To: Juraj Marcin, qemu-devel; +Cc: Fabiano Rosas, Peter Xu, qemu-stable
On 26.08.2025 14:51, Juraj Marcin wrote:
> From: Juraj Marcin <jmarcin@redhat.com>
>
> Commit 48814111366b ("migration: Always set DEVICE state") introduced
> DEVICE state to postcopy, which moved the actual state transition that
> leads to POSTCOPY_ACTIVE.
>
> However, the error handling part of the postcopy_start() function still
> expects the state POSTCOPY_ACTIVE, but depending on where an error
> happens, now the state can be either ACTIVE, DEVICE or CANCELLING, but
> never POSTCOPY_ACTIVE, as this transition now happens just before a
> successful return from the function.
>
> Instead, accept any state except CANCELLING when transitioning to FAILED
> state.
>
> Cc: qemu-stable@nongnu.org
> Fixes: 48814111366b ("migration: Always set DEVICE state")
> Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
>
> ---
> In the RFC[1] where this patch was discussed, there was also a
> suggestion for a helper function migrate_set_failure() that would check
> if the state is not CANCELLING and then set migration error and FAILED
> state. I discussed the implementation with Peter, and we came to a
> conclusion that instead of patching such clean-up on top of the current
> error handling code, it might be more useful to do a larger refactor and
> clean-up of all error handling in the migration code.
>
> Such clean-up should reduce the number of places where we need to
> explicitly transition to a FAILED state (ideally to one, or only a
> couple of places), and instead only set an appropriate migration error
> using migrate_set_error(). Additionally, it would also refactor
> inappropriate uses of QEMUFile errors where the error is not really an
> error of the underlying channel and migrate_set_error() should be used
> instead.
>
> [1]: https://lore.kernel.org/all/20250807114922.1013286-3-jmarcin@redhat.com/
Ping? Can we apply this to the master branch, so I can pick it up for
the stable series?
Thanks,
/mjt
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] migration: Fix state transition in postcopy_start() error handling
2025-09-27 14:01 ` Michael Tokarev
@ 2025-09-29 15:47 ` Peter Xu
0 siblings, 0 replies; 5+ messages in thread
From: Peter Xu @ 2025-09-29 15:47 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Juraj Marcin, qemu-devel, Fabiano Rosas, qemu-stable
On Sat, Sep 27, 2025 at 05:01:11PM +0300, Michael Tokarev wrote:
> On 26.08.2025 14:51, Juraj Marcin wrote:
> > From: Juraj Marcin <jmarcin@redhat.com>
> >
> > Commit 48814111366b ("migration: Always set DEVICE state") introduced
> > DEVICE state to postcopy, which moved the actual state transition that
> > leads to POSTCOPY_ACTIVE.
> >
> > However, the error handling part of the postcopy_start() function still
> > expects the state POSTCOPY_ACTIVE, but depending on where an error
> > happens, now the state can be either ACTIVE, DEVICE or CANCELLING, but
> > never POSTCOPY_ACTIVE, as this transition now happens just before a
> > successful return from the function.
> >
> > Instead, accept any state except CANCELLING when transitioning to FAILED
> > state.
> >
> > Cc: qemu-stable@nongnu.org
> > Fixes: 48814111366b ("migration: Always set DEVICE state")
> > Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
> >
> > ---
> > In the RFC[1] where this patch was discussed, there was also a
> > suggestion for a helper function migrate_set_failure() that would check
> > if the state is not CANCELLING and then set migration error and FAILED
> > state. I discussed the implementation with Peter, and we came to a
> > conclusion that instead of patching such clean-up on top of the current
> > error handling code, it might be more useful to do a larger refactor and
> > clean-up of all error handling in the migration code.
> >
> > Such clean-up should reduce the number of places where we need to
> > explicitly transition to a FAILED state (ideally to one, or only a
> > couple of places), and instead only set an appropriate migration error
> > using migrate_set_error(). Additionally, it would also refactor
> > inappropriate uses of QEMUFile errors where the error is not really an
> > error of the underlying channel and migrate_set_error() should be used
> > instead.
> >
> > [1]: https://lore.kernel.org/all/20250807114922.1013286-3-jmarcin@redhat.com/
>
> Ping? Can we apply this to the master branch, so I can pick it up for
> the stable series?
Apologies for the delay, queued. Will send the PR this week.
--
Peter Xu
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-09-29 15:48 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-26 11:51 [PATCH] migration: Fix state transition in postcopy_start() error handling Juraj Marcin
2025-08-26 18:23 ` Peter Xu
2025-08-26 19:00 ` Fabiano Rosas
2025-09-27 14:01 ` Michael Tokarev
2025-09-29 15:47 ` Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).