* [PATCH for-9.0 0/2] migration: Two migration bug fixes
@ 2024-03-28 14:02 Avihai Horon
2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 14:02 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Cédric Le Goater, Avihai Horon
Hello,
This small series fixes two migration bugs I stumbled upon recently.
Comments are welcome, thanks for reviewing.
Avihai Horon (2):
migration: Set migration error in migration_completion()
migration/postcopy: Ensure postcopy_start() sets errp if it fails
migration/migration.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
--
2.26.3
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
2024-03-28 14:02 [PATCH for-9.0 0/2] migration: Two migration bug fixes Avihai Horon
@ 2024-03-28 14:02 ` Avihai Horon
2024-03-28 15:09 ` Peter Xu
2024-03-28 15:21 ` Cédric Le Goater
2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
2024-03-28 17:04 ` [PATCH for-9.0 0/2] migration: Two migration bug fixes Peter Xu
2 siblings, 2 replies; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 14:02 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Cédric Le Goater, Avihai Horon
After commit 9425ef3f990a ("migration: Use migrate_has_error() in
close_return_path_on_source()"), close_return_path_on_source() assumes
that migration error is set if an error occurs during migration.
This may not be true if migration errors in migration_completion(). For
example, if qemu_savevm_state_complete_precopy() errors, migration error
will not be set.
This in turn, will cause a migration hang bug, similar to the bug that
was fixed by commit 22b04245f0d5 ("migration: Join the return path
thread before releasing to_dst_file"), as shutdown() will not be issued
for the return-path channel.
Fix it by ensuring migration error is set in case of error in
migration_completion().
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
migration/migration.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/migration/migration.c b/migration/migration.c
index 9fe8fd2afd7..b73ae3a72c4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
{
int ret = 0;
int current_active_state = s->state;
+ Error *local_err = NULL;
if (s->state == MIGRATION_STATUS_ACTIVE) {
ret = migration_completion_precopy(s, ¤t_active_state);
@@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
return;
fail:
+ if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
+ migrate_set_error(s, local_err);
+ error_free(local_err);
+ } else if (ret) {
+ error_setg_errno(&local_err, -ret, "Error in migration completion");
+ migrate_set_error(s, local_err);
+ error_free(local_err);
+ }
+
migration_completion_failed(s, current_active_state);
}
--
2.26.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails
2024-03-28 14:02 [PATCH for-9.0 0/2] migration: Two migration bug fixes Avihai Horon
2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
@ 2024-03-28 14:02 ` Avihai Horon
2024-03-28 15:04 ` Cédric Le Goater
2024-03-28 15:10 ` Peter Xu
2024-03-28 17:04 ` [PATCH for-9.0 0/2] migration: Two migration bug fixes Peter Xu
2 siblings, 2 replies; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 14:02 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Cédric Le Goater, Avihai Horon
There are several places where postcopy_start() fails without setting
errp. This can cause a null pointer de-reference, as in case of error,
the caller of postcopy_start() copies/prints the error set in errp.
Fix it by setting errp in all of postcopy_start() error paths.
Fixes: 908927db28ea ("migration: Update error description whenever migration fails")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
migration/migration.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/migration/migration.c b/migration/migration.c
index b73ae3a72c4..86bf76e9258 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2510,6 +2510,8 @@ static int postcopy_start(MigrationState *ms, Error **errp)
migration_wait_main_channel(ms);
if (postcopy_preempt_establish_channel(ms)) {
migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
+ error_setg(errp, "%s: Failed to establish preempt channel",
+ __func__);
return -1;
}
}
@@ -2525,17 +2527,22 @@ static int postcopy_start(MigrationState *ms, Error **errp)
ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE);
if (ret < 0) {
+ error_setg_errno(errp, -ret, "%s: Failed to stop the VM", __func__);
goto fail;
}
ret = migration_maybe_pause(ms, &cur_state,
MIGRATION_STATUS_POSTCOPY_ACTIVE);
if (ret < 0) {
+ error_setg_errno(errp, -ret, "%s: Failed in migration_maybe_pause()",
+ __func__);
goto fail;
}
ret = bdrv_inactivate_all();
if (ret < 0) {
+ error_setg_errno(errp, -ret, "%s: Failed in bdrv_inactivate_all()",
+ __func__);
goto fail;
}
restart_block = true;
@@ -2612,6 +2619,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
/* Now send that blob */
if (qemu_savevm_send_packaged(ms->to_dst_file, bioc->data, bioc->usage)) {
+ error_setg(errp, "%s: Failed to send packaged data", __func__);
goto fail_closefb;
}
qemu_fclose(fb);
--
2.26.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails
2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
@ 2024-03-28 15:04 ` Cédric Le Goater
2024-03-28 15:10 ` Peter Xu
1 sibling, 0 replies; 11+ messages in thread
From: Cédric Le Goater @ 2024-03-28 15:04 UTC (permalink / raw)
To: Avihai Horon, qemu-devel; +Cc: Peter Xu, Fabiano Rosas
On 3/28/24 15:02, Avihai Horon wrote:
> There are several places where postcopy_start() fails without setting
> errp. This can cause a null pointer de-reference, as in case of error,
> the caller of postcopy_start() copies/prints the error set in errp.
>
> Fix it by setting errp in all of postcopy_start() error paths.
>
> Fixes: 908927db28ea ("migration: Update error description whenever migration fails")
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> migration/migration.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index b73ae3a72c4..86bf76e9258 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2510,6 +2510,8 @@ static int postcopy_start(MigrationState *ms, Error **errp)
> migration_wait_main_channel(ms);
> if (postcopy_preempt_establish_channel(ms)) {
> migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
> + error_setg(errp, "%s: Failed to establish preempt channel",
> + __func__);
> return -1;
> }
> }
> @@ -2525,17 +2527,22 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>
> ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE);
> if (ret < 0) {
> + error_setg_errno(errp, -ret, "%s: Failed to stop the VM", __func__);
> goto fail;
> }
>
> ret = migration_maybe_pause(ms, &cur_state,
> MIGRATION_STATUS_POSTCOPY_ACTIVE);
> if (ret < 0) {
> + error_setg_errno(errp, -ret, "%s: Failed in migration_maybe_pause()",
> + __func__);
> goto fail;
> }
>
> ret = bdrv_inactivate_all();
> if (ret < 0) {
> + error_setg_errno(errp, -ret, "%s: Failed in bdrv_inactivate_all()",
> + __func__);
> goto fail;
> }
> restart_block = true;
> @@ -2612,6 +2619,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>
> /* Now send that blob */
> if (qemu_savevm_send_packaged(ms->to_dst_file, bioc->data, bioc->usage)) {
> + error_setg(errp, "%s: Failed to send packaged data", __func__);
> goto fail_closefb;
> }
> qemu_fclose(fb);
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
@ 2024-03-28 15:09 ` Peter Xu
2024-03-28 15:54 ` Avihai Horon
2024-03-28 15:21 ` Cédric Le Goater
1 sibling, 1 reply; 11+ messages in thread
From: Peter Xu @ 2024-03-28 15:09 UTC (permalink / raw)
To: Avihai Horon; +Cc: qemu-devel, Fabiano Rosas, Cédric Le Goater
On Thu, Mar 28, 2024 at 04:02:51PM +0200, Avihai Horon wrote:
> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
> close_return_path_on_source()"), close_return_path_on_source() assumes
> that migration error is set if an error occurs during migration.
>
> This may not be true if migration errors in migration_completion(). For
> example, if qemu_savevm_state_complete_precopy() errors, migration error
> will not be set.
>
> This in turn, will cause a migration hang bug, similar to the bug that
> was fixed by commit 22b04245f0d5 ("migration: Join the return path
> thread before releasing to_dst_file"), as shutdown() will not be issued
> for the return-path channel.
>
> Fix it by ensuring migration error is set in case of error in
> migration_completion().
>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
I'll attach this if it looks all right to you:
Fixes: 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()")
Thanks,
> ---
> migration/migration.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 9fe8fd2afd7..b73ae3a72c4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
> {
> int ret = 0;
> int current_active_state = s->state;
> + Error *local_err = NULL;
>
> if (s->state == MIGRATION_STATUS_ACTIVE) {
> ret = migration_completion_precopy(s, ¤t_active_state);
> @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
> return;
>
> fail:
> + if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
> + migrate_set_error(s, local_err);
> + error_free(local_err);
> + } else if (ret) {
> + error_setg_errno(&local_err, -ret, "Error in migration completion");
> + migrate_set_error(s, local_err);
> + error_free(local_err);
> + }
> +
> migration_completion_failed(s, current_active_state);
> }
>
> --
> 2.26.3
>
>
--
Peter Xu
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails
2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
2024-03-28 15:04 ` Cédric Le Goater
@ 2024-03-28 15:10 ` Peter Xu
1 sibling, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-03-28 15:10 UTC (permalink / raw)
To: Avihai Horon; +Cc: qemu-devel, Fabiano Rosas, Cédric Le Goater
On Thu, Mar 28, 2024 at 04:02:52PM +0200, Avihai Horon wrote:
> There are several places where postcopy_start() fails without setting
> errp. This can cause a null pointer de-reference, as in case of error,
> the caller of postcopy_start() copies/prints the error set in errp.
>
> Fix it by setting errp in all of postcopy_start() error paths.
>
> Fixes: 908927db28ea ("migration: Update error description whenever migration fails")
I think this should need:
Cc: qemu-stable <qemu-stable@nongnu.org>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
2024-03-28 15:09 ` Peter Xu
@ 2024-03-28 15:21 ` Cédric Le Goater
2024-03-28 15:50 ` Avihai Horon
1 sibling, 1 reply; 11+ messages in thread
From: Cédric Le Goater @ 2024-03-28 15:21 UTC (permalink / raw)
To: Avihai Horon, qemu-devel; +Cc: Peter Xu, Fabiano Rosas
Hello Avihai,
On 3/28/24 15:02, Avihai Horon wrote:
> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
> close_return_path_on_source()"), close_return_path_on_source() assumes
> that migration error is set if an error occurs during migration.
>
> This may not be true if migration errors in migration_completion(). For
> example, if qemu_savevm_state_complete_precopy() errors, migration error
> will not be set
Out of curiosity, could you describe a bit more the context ? Did
vfio_save_complete_precopy() fail ? why ?
We should propagate errors of .save_live_complete_precopy() handlers as
it was done .save_setup handlers(). For 9.1.
> This in turn, will cause a migration hang bug, similar to the bug that
> was fixed by commit 22b04245f0d5 ("migration: Join the return path
> thread before releasing to_dst_file"), as shutdown() will not be issued
> for the return-path channel.
yes, but this test :
if (ret < 0) {
goto fail;
}
will skip the close_return_path_on_source() call. Won't it ? So I don't
understand how it can be an issue. Am I missing something ?
> Fix it by ensuring migration error is set in case of error in
> migration_completion().
Why didn't you add a reference to commit 9425ef3f990a ?
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
> migration/migration.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 9fe8fd2afd7..b73ae3a72c4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
> {
> int ret = 0;
> int current_active_state = s->state;
> + Error *local_err = NULL;
>
> if (s->state == MIGRATION_STATUS_ACTIVE) {
> ret = migration_completion_precopy(s, ¤t_active_state);
> @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
> return;
>
> fail:
> + if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
> + migrate_set_error(s, local_err);
> + error_free(local_err);
> + } else if (ret) {
> + error_setg_errno(&local_err, -ret, "Error in migration completion");
The 'ret = -1' case could be improved with error_setg(). As a followup.
Thanks,
C.
> + migrate_set_error(s, local_err);
> + error_free(local_err);
> + }
> +
> migration_completion_failed(s, current_active_state);
> }
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
2024-03-28 15:21 ` Cédric Le Goater
@ 2024-03-28 15:50 ` Avihai Horon
2024-03-28 16:29 ` Cédric Le Goater
0 siblings, 1 reply; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 15:50 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel; +Cc: Peter Xu, Fabiano Rosas
On 28/03/2024 17:21, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> Hello Avihai,
>
> On 3/28/24 15:02, Avihai Horon wrote:
>> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
>> close_return_path_on_source()"), close_return_path_on_source() assumes
>> that migration error is set if an error occurs during migration.
>>
>> This may not be true if migration errors in migration_completion(). For
>> example, if qemu_savevm_state_complete_precopy() errors, migration error
>> will not be set
>
> Out of curiosity, could you describe a bit more the context ? Did
> vfio_save_complete_precopy() fail ? why ?
Yep, vfio_save_complete_precopy() failed (but it failed while I was
experimenting with an unofficial debug FW).
>
> We should propagate errors of .save_live_complete_precopy() handlers as
> it was done .save_setup handlers(). For 9.1.
Agreed.
>
>> This in turn, will cause a migration hang bug, similar to the bug that
>> was fixed by commit 22b04245f0d5 ("migration: Join the return path
>> thread before releasing to_dst_file"), as shutdown() will not be issued
>> for the return-path channel.
>
> yes, but this test :
>
> if (ret < 0) {
> goto fail;
> }
>
> will skip the close_return_path_on_source() call. Won't it ? So I don't
> understand how it can be an issue. Am I missing something ?
It will skip the close_return_path_on_source() call in
migration_completion(), but there is another
close_return_path_on_source() call in migrate_fd_cleanup().
>
>> Fix it by ensuring migration error is set in case of error in
>> migration_completion().
>
> Why didn't you add a reference to commit 9425ef3f990a ?
I thought this commit didn't introduce this bug, but looking again in
the mailing list [1], it kinda did:
The hang bug was fully fixed by commit 22b04245f0d ("migration: Join the
return path thread before releasing to_dst_file") and then 9425ef3f990a
re-introduced the bug, but only for migration_completion() case.
So, you are right, a fixes line with 9425ef3f990a should be added.
Thanks.
[1] https://lore.kernel.org/all/20240226203122.22894-1-farosas@suse.de/
>
>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>> migration/migration.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 9fe8fd2afd7..b73ae3a72c4 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState
>> *s)
>> {
>> int ret = 0;
>> int current_active_state = s->state;
>> + Error *local_err = NULL;
>>
>> if (s->state == MIGRATION_STATUS_ACTIVE) {
>> ret = migration_completion_precopy(s, ¤t_active_state);
>> @@ -2832,6 +2833,15 @@ static void
>> migration_completion(MigrationState *s)
>> return;
>>
>> fail:
>> + if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
>> + migrate_set_error(s, local_err);
>> + error_free(local_err);
>> + } else if (ret) {
>> + error_setg_errno(&local_err, -ret, "Error in migration
>> completion");
>
> The 'ret = -1' case could be improved with error_setg(). As a followup.
>
> Thanks,
>
> C.
>
>
>
>
>> + migrate_set_error(s, local_err);
>> + error_free(local_err);
>> + }
>> +
>> migration_completion_failed(s, current_active_state);
>> }
>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
2024-03-28 15:09 ` Peter Xu
@ 2024-03-28 15:54 ` Avihai Horon
0 siblings, 0 replies; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 15:54 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas, Cédric Le Goater
On 28/03/2024 17:09, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, Mar 28, 2024 at 04:02:51PM +0200, Avihai Horon wrote:
>> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
>> close_return_path_on_source()"), close_return_path_on_source() assumes
>> that migration error is set if an error occurs during migration.
>>
>> This may not be true if migration errors in migration_completion(). For
>> example, if qemu_savevm_state_complete_precopy() errors, migration error
>> will not be set.
>>
>> This in turn, will cause a migration hang bug, similar to the bug that
>> was fixed by commit 22b04245f0d5 ("migration: Join the return path
>> thread before releasing to_dst_file"), as shutdown() will not be issued
>> for the return-path channel.
>>
>> Fix it by ensuring migration error is set in case of error in
>> migration_completion().
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
> I'll attach this if it looks all right to you:
>
> Fixes: 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()")
Yes, sure, go ahead.
Thanks.
>
> Thanks,
>
>> ---
>> migration/migration.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 9fe8fd2afd7..b73ae3a72c4 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
>> {
>> int ret = 0;
>> int current_active_state = s->state;
>> + Error *local_err = NULL;
>>
>> if (s->state == MIGRATION_STATUS_ACTIVE) {
>> ret = migration_completion_precopy(s, ¤t_active_state);
>> @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
>> return;
>>
>> fail:
>> + if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
>> + migrate_set_error(s, local_err);
>> + error_free(local_err);
>> + } else if (ret) {
>> + error_setg_errno(&local_err, -ret, "Error in migration completion");
>> + migrate_set_error(s, local_err);
>> + error_free(local_err);
>> + }
>> +
>> migration_completion_failed(s, current_active_state);
>> }
>>
>> --
>> 2.26.3
>>
>>
> --
> Peter Xu
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
2024-03-28 15:50 ` Avihai Horon
@ 2024-03-28 16:29 ` Cédric Le Goater
0 siblings, 0 replies; 11+ messages in thread
From: Cédric Le Goater @ 2024-03-28 16:29 UTC (permalink / raw)
To: Avihai Horon, qemu-devel; +Cc: Peter Xu, Fabiano Rosas
On 3/28/24 16:50, Avihai Horon wrote:
>
> On 28/03/2024 17:21, Cédric Le Goater wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello Avihai,
>>
>> On 3/28/24 15:02, Avihai Horon wrote:
>>> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
>>> close_return_path_on_source()"), close_return_path_on_source() assumes
>>> that migration error is set if an error occurs during migration.
>>>
>>> This may not be true if migration errors in migration_completion(). For
>>> example, if qemu_savevm_state_complete_precopy() errors, migration error
>>> will not be set
>>
>> Out of curiosity, could you describe a bit more the context ? Did
>> vfio_save_complete_precopy() fail ? why ?
>
> Yep, vfio_save_complete_precopy() failed (but it failed while I was experimenting with an unofficial debug FW).
>
>>
>> We should propagate errors of .save_live_complete_precopy() handlers as
>> it was done .save_setup handlers(). For 9.1.
>
> Agreed.
>
>>
>>> This in turn, will cause a migration hang bug, similar to the bug that
>>> was fixed by commit 22b04245f0d5 ("migration: Join the return path
>>> thread before releasing to_dst_file"), as shutdown() will not be issued
>>> for the return-path channel.
>>
>> yes, but this test :
>>
>> if (ret < 0) {
>> goto fail;
>> }
>>
>> will skip the close_return_path_on_source() call. Won't it ? So I don't
>> understand how it can be an issue. Am I missing something ?
>
> It will skip the close_return_path_on_source() call in migration_completion(), but there is another close_return_path_on_source() call in migrate_fd_cleanup().
OK. Found it. This is a code path I hadn't explored yet.
Acked-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
>
>>
>>> Fix it by ensuring migration error is set in case of error in
>>> migration_completion().
>>
>> Why didn't you add a reference to commit 9425ef3f990a ?
>
> I thought this commit didn't introduce this bug, but looking again in the mailing list [1], it kinda did:
> The hang bug was fully fixed by commit 22b04245f0d ("migration: Join the return path thread before releasing to_dst_file") and then 9425ef3f990a re-introduced the bug, but only for migration_completion() case.
> So, you are right, a fixes line with 9425ef3f990a should be added.
>
> Thanks.
>
> [1] https://lore.kernel.org/all/20240226203122.22894-1-farosas@suse.de/
>
>>
>>
>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> ---
>>> migration/migration.c | 10 ++++++++++
>>> 1 file changed, 10 insertions(+)
>>>
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 9fe8fd2afd7..b73ae3a72c4 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
>>> {
>>> int ret = 0;
>>> int current_active_state = s->state;
>>> + Error *local_err = NULL;
>>>
>>> if (s->state == MIGRATION_STATUS_ACTIVE) {
>>> ret = migration_completion_precopy(s, ¤t_active_state);
>>> @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
>>> return;
>>>
>>> fail:
>>> + if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
>>> + migrate_set_error(s, local_err);
>>> + error_free(local_err);
>>> + } else if (ret) {
>>> + error_setg_errno(&local_err, -ret, "Error in migration completion");
>>
>> The 'ret = -1' case could be improved with error_setg(). As a followup.
>>
>> Thanks,
>>
>> C.
>>
>>
>>
>>
>>> + migrate_set_error(s, local_err);
>>> + error_free(local_err);
>>> + }
>>> +
>>> migration_completion_failed(s, current_active_state);
>>> }
>>>
>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-9.0 0/2] migration: Two migration bug fixes
2024-03-28 14:02 [PATCH for-9.0 0/2] migration: Two migration bug fixes Avihai Horon
2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
@ 2024-03-28 17:04 ` Peter Xu
2 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-03-28 17:04 UTC (permalink / raw)
To: Avihai Horon; +Cc: qemu-devel, Fabiano Rosas, Cédric Le Goater
On Thu, Mar 28, 2024 at 04:02:50PM +0200, Avihai Horon wrote:
> Hello,
>
> This small series fixes two migration bugs I stumbled upon recently.
> Comments are welcome, thanks for reviewing.
>
> Avihai Horon (2):
> migration: Set migration error in migration_completion()
> migration/postcopy: Ensure postcopy_start() sets errp if it fails
queued for 9.0-rc2, thanks.
--
Peter Xu
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-03-28 17:06 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-28 14:02 [PATCH for-9.0 0/2] migration: Two migration bug fixes Avihai Horon
2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
2024-03-28 15:09 ` Peter Xu
2024-03-28 15:54 ` Avihai Horon
2024-03-28 15:21 ` Cédric Le Goater
2024-03-28 15:50 ` Avihai Horon
2024-03-28 16:29 ` Cédric Le Goater
2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
2024-03-28 15:04 ` Cédric Le Goater
2024-03-28 15:10 ` Peter Xu
2024-03-28 17:04 ` [PATCH for-9.0 0/2] migration: Two migration bug fixes Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).