qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-9.0 0/2] migration: Two migration bug fixes
@ 2024-03-28 14:02 Avihai Horon
  2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 14:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Cédric Le Goater, Avihai Horon

Hello,

This small series fixes two migration bugs I stumbled upon recently.
Comments are welcome, thanks for reviewing.

Avihai Horon (2):
  migration: Set migration error in migration_completion()
  migration/postcopy: Ensure postcopy_start() sets errp if it fails

 migration/migration.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
  2024-03-28 14:02 [PATCH for-9.0 0/2] migration: Two migration bug fixes Avihai Horon
@ 2024-03-28 14:02 ` Avihai Horon
  2024-03-28 15:09   ` Peter Xu
  2024-03-28 15:21   ` Cédric Le Goater
  2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
  2024-03-28 17:04 ` [PATCH for-9.0 0/2] migration: Two migration bug fixes Peter Xu
  2 siblings, 2 replies; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 14:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Cédric Le Goater, Avihai Horon

After commit 9425ef3f990a ("migration: Use migrate_has_error() in
close_return_path_on_source()"), close_return_path_on_source() assumes
that migration error is set if an error occurs during migration.

This may not be true if migration errors in migration_completion(). For
example, if qemu_savevm_state_complete_precopy() errors, migration error
will not be set.

This in turn, will cause a migration hang bug, similar to the bug that
was fixed by commit 22b04245f0d5 ("migration: Join the return path
thread before releasing to_dst_file"), as shutdown() will not be issued
for the return-path channel.

Fix it by ensuring migration error is set in case of error in
migration_completion().

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/migration.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 9fe8fd2afd7..b73ae3a72c4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
 {
     int ret = 0;
     int current_active_state = s->state;
+    Error *local_err = NULL;
 
     if (s->state == MIGRATION_STATUS_ACTIVE) {
         ret = migration_completion_precopy(s, &current_active_state);
@@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
     return;
 
 fail:
+    if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
+        migrate_set_error(s, local_err);
+        error_free(local_err);
+    } else if (ret) {
+        error_setg_errno(&local_err, -ret, "Error in migration completion");
+        migrate_set_error(s, local_err);
+        error_free(local_err);
+    }
+
     migration_completion_failed(s, current_active_state);
 }
 
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails
  2024-03-28 14:02 [PATCH for-9.0 0/2] migration: Two migration bug fixes Avihai Horon
  2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
@ 2024-03-28 14:02 ` Avihai Horon
  2024-03-28 15:04   ` Cédric Le Goater
  2024-03-28 15:10   ` Peter Xu
  2024-03-28 17:04 ` [PATCH for-9.0 0/2] migration: Two migration bug fixes Peter Xu
  2 siblings, 2 replies; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 14:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Cédric Le Goater, Avihai Horon

There are several places where postcopy_start() fails without setting
errp. This can cause a null pointer de-reference, as in case of error,
the caller of postcopy_start() copies/prints the error set in errp.

Fix it by setting errp in all of postcopy_start() error paths.

Fixes: 908927db28ea ("migration: Update error description whenever migration fails")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/migration.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index b73ae3a72c4..86bf76e9258 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2510,6 +2510,8 @@ static int postcopy_start(MigrationState *ms, Error **errp)
         migration_wait_main_channel(ms);
         if (postcopy_preempt_establish_channel(ms)) {
             migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
+            error_setg(errp, "%s: Failed to establish preempt channel",
+                       __func__);
             return -1;
         }
     }
@@ -2525,17 +2527,22 @@ static int postcopy_start(MigrationState *ms, Error **errp)
 
     ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE);
     if (ret < 0) {
+        error_setg_errno(errp, -ret, "%s: Failed to stop the VM", __func__);
         goto fail;
     }
 
     ret = migration_maybe_pause(ms, &cur_state,
                                 MIGRATION_STATUS_POSTCOPY_ACTIVE);
     if (ret < 0) {
+        error_setg_errno(errp, -ret, "%s: Failed in migration_maybe_pause()",
+                         __func__);
         goto fail;
     }
 
     ret = bdrv_inactivate_all();
     if (ret < 0) {
+        error_setg_errno(errp, -ret, "%s: Failed in bdrv_inactivate_all()",
+                         __func__);
         goto fail;
     }
     restart_block = true;
@@ -2612,6 +2619,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
 
     /* Now send that blob */
     if (qemu_savevm_send_packaged(ms->to_dst_file, bioc->data, bioc->usage)) {
+        error_setg(errp, "%s: Failed to send packaged data", __func__);
         goto fail_closefb;
     }
     qemu_fclose(fb);
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails
  2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
@ 2024-03-28 15:04   ` Cédric Le Goater
  2024-03-28 15:10   ` Peter Xu
  1 sibling, 0 replies; 11+ messages in thread
From: Cédric Le Goater @ 2024-03-28 15:04 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel; +Cc: Peter Xu, Fabiano Rosas

On 3/28/24 15:02, Avihai Horon wrote:
> There are several places where postcopy_start() fails without setting
> errp. This can cause a null pointer de-reference, as in case of error,
> the caller of postcopy_start() copies/prints the error set in errp.
> 
> Fix it by setting errp in all of postcopy_start() error paths.
> 
> Fixes: 908927db28ea ("migration: Update error description whenever migration fails")
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   migration/migration.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index b73ae3a72c4..86bf76e9258 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2510,6 +2510,8 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>           migration_wait_main_channel(ms);
>           if (postcopy_preempt_establish_channel(ms)) {
>               migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
> +            error_setg(errp, "%s: Failed to establish preempt channel",
> +                       __func__);
>               return -1;
>           }
>       }
> @@ -2525,17 +2527,22 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>   
>       ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE);
>       if (ret < 0) {
> +        error_setg_errno(errp, -ret, "%s: Failed to stop the VM", __func__);
>           goto fail;
>       }
>   
>       ret = migration_maybe_pause(ms, &cur_state,
>                                   MIGRATION_STATUS_POSTCOPY_ACTIVE);
>       if (ret < 0) {
> +        error_setg_errno(errp, -ret, "%s: Failed in migration_maybe_pause()",
> +                         __func__);
>           goto fail;
>       }
>   
>       ret = bdrv_inactivate_all();
>       if (ret < 0) {
> +        error_setg_errno(errp, -ret, "%s: Failed in bdrv_inactivate_all()",
> +                         __func__);
>           goto fail;
>       }
>       restart_block = true;
> @@ -2612,6 +2619,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>   
>       /* Now send that blob */
>       if (qemu_savevm_send_packaged(ms->to_dst_file, bioc->data, bioc->usage)) {
> +        error_setg(errp, "%s: Failed to send packaged data", __func__);
>           goto fail_closefb;
>       }
>       qemu_fclose(fb);



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
  2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
@ 2024-03-28 15:09   ` Peter Xu
  2024-03-28 15:54     ` Avihai Horon
  2024-03-28 15:21   ` Cédric Le Goater
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Xu @ 2024-03-28 15:09 UTC (permalink / raw)
  To: Avihai Horon; +Cc: qemu-devel, Fabiano Rosas, Cédric Le Goater

On Thu, Mar 28, 2024 at 04:02:51PM +0200, Avihai Horon wrote:
> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
> close_return_path_on_source()"), close_return_path_on_source() assumes
> that migration error is set if an error occurs during migration.
> 
> This may not be true if migration errors in migration_completion(). For
> example, if qemu_savevm_state_complete_precopy() errors, migration error
> will not be set.
> 
> This in turn, will cause a migration hang bug, similar to the bug that
> was fixed by commit 22b04245f0d5 ("migration: Join the return path
> thread before releasing to_dst_file"), as shutdown() will not be issued
> for the return-path channel.
> 
> Fix it by ensuring migration error is set in case of error in
> migration_completion().
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

I'll attach this if it looks all right to you:

Fixes: 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()")

Thanks,

> ---
>  migration/migration.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 9fe8fd2afd7..b73ae3a72c4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
>  {
>      int ret = 0;
>      int current_active_state = s->state;
> +    Error *local_err = NULL;
>  
>      if (s->state == MIGRATION_STATUS_ACTIVE) {
>          ret = migration_completion_precopy(s, &current_active_state);
> @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
>      return;
>  
>  fail:
> +    if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
> +    } else if (ret) {
> +        error_setg_errno(&local_err, -ret, "Error in migration completion");
> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
> +    }
> +
>      migration_completion_failed(s, current_active_state);
>  }
>  
> -- 
> 2.26.3
> 
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails
  2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
  2024-03-28 15:04   ` Cédric Le Goater
@ 2024-03-28 15:10   ` Peter Xu
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-03-28 15:10 UTC (permalink / raw)
  To: Avihai Horon; +Cc: qemu-devel, Fabiano Rosas, Cédric Le Goater

On Thu, Mar 28, 2024 at 04:02:52PM +0200, Avihai Horon wrote:
> There are several places where postcopy_start() fails without setting
> errp. This can cause a null pointer de-reference, as in case of error,
> the caller of postcopy_start() copies/prints the error set in errp.
> 
> Fix it by setting errp in all of postcopy_start() error paths.
> 
> Fixes: 908927db28ea ("migration: Update error description whenever migration fails")

I think this should need:

Cc: qemu-stable <qemu-stable@nongnu.org>

> Signed-off-by: Avihai Horon <avihaih@nvidia.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
  2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
  2024-03-28 15:09   ` Peter Xu
@ 2024-03-28 15:21   ` Cédric Le Goater
  2024-03-28 15:50     ` Avihai Horon
  1 sibling, 1 reply; 11+ messages in thread
From: Cédric Le Goater @ 2024-03-28 15:21 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel; +Cc: Peter Xu, Fabiano Rosas

Hello Avihai,

On 3/28/24 15:02, Avihai Horon wrote:
> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
> close_return_path_on_source()"), close_return_path_on_source() assumes
> that migration error is set if an error occurs during migration.
> 
> This may not be true if migration errors in migration_completion(). For
> example, if qemu_savevm_state_complete_precopy() errors, migration error
> will not be set

Out of curiosity, could you describe a bit more the context ? Did
vfio_save_complete_precopy() fail ? why ?

We should propagate errors of .save_live_complete_precopy() handlers as
it was done .save_setup handlers(). For 9.1.

> This in turn, will cause a migration hang bug, similar to the bug that
> was fixed by commit 22b04245f0d5 ("migration: Join the return path
> thread before releasing to_dst_file"), as shutdown() will not be issued
> for the return-path channel.

yes, but this test :

     if (ret < 0) {
         goto fail;
     }

will skip the close_return_path_on_source() call. Won't it ? So I don't
understand how it can be an issue. Am I missing something ?

> Fix it by ensuring migration error is set in case of error in
> migration_completion().

Why didn't you add a reference to commit 9425ef3f990a ?


> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>   migration/migration.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 9fe8fd2afd7..b73ae3a72c4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
>   {
>       int ret = 0;
>       int current_active_state = s->state;
> +    Error *local_err = NULL;
>   
>       if (s->state == MIGRATION_STATUS_ACTIVE) {
>           ret = migration_completion_precopy(s, &current_active_state);
> @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
>       return;
>   
>   fail:
> +    if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
> +    } else if (ret) {
> +        error_setg_errno(&local_err, -ret, "Error in migration completion");

The 'ret = -1' case could be improved with error_setg(). As a followup.

Thanks,

C.




> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
> +    }
> +
>       migration_completion_failed(s, current_active_state);
>   }
>   



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
  2024-03-28 15:21   ` Cédric Le Goater
@ 2024-03-28 15:50     ` Avihai Horon
  2024-03-28 16:29       ` Cédric Le Goater
  0 siblings, 1 reply; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 15:50 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel; +Cc: Peter Xu, Fabiano Rosas


On 28/03/2024 17:21, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> Hello Avihai,
>
> On 3/28/24 15:02, Avihai Horon wrote:
>> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
>> close_return_path_on_source()"), close_return_path_on_source() assumes
>> that migration error is set if an error occurs during migration.
>>
>> This may not be true if migration errors in migration_completion(). For
>> example, if qemu_savevm_state_complete_precopy() errors, migration error
>> will not be set
>
> Out of curiosity, could you describe a bit more the context ? Did
> vfio_save_complete_precopy() fail ? why ?

Yep, vfio_save_complete_precopy() failed (but it failed while I was 
experimenting with an unofficial debug FW).

>
> We should propagate errors of .save_live_complete_precopy() handlers as
> it was done .save_setup handlers(). For 9.1.

Agreed.

>
>> This in turn, will cause a migration hang bug, similar to the bug that
>> was fixed by commit 22b04245f0d5 ("migration: Join the return path
>> thread before releasing to_dst_file"), as shutdown() will not be issued
>> for the return-path channel.
>
> yes, but this test :
>
>     if (ret < 0) {
>         goto fail;
>     }
>
> will skip the close_return_path_on_source() call. Won't it ? So I don't
> understand how it can be an issue. Am I missing something ?

It will skip the close_return_path_on_source() call in 
migration_completion(), but there is another 
close_return_path_on_source() call in migrate_fd_cleanup().

>
>> Fix it by ensuring migration error is set in case of error in
>> migration_completion().
>
> Why didn't you add a reference to commit 9425ef3f990a ?

I thought this commit didn't introduce this bug, but looking again in 
the mailing list [1], it kinda did:
The hang bug was fully fixed by commit 22b04245f0d ("migration: Join the 
return path thread before releasing to_dst_file") and then 9425ef3f990a 
re-introduced the bug, but only for migration_completion() case.
So, you are right, a fixes line with 9425ef3f990a should be added.

Thanks.

[1] https://lore.kernel.org/all/20240226203122.22894-1-farosas@suse.de/

>
>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>>   migration/migration.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 9fe8fd2afd7..b73ae3a72c4 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState 
>> *s)
>>   {
>>       int ret = 0;
>>       int current_active_state = s->state;
>> +    Error *local_err = NULL;
>>
>>       if (s->state == MIGRATION_STATUS_ACTIVE) {
>>           ret = migration_completion_precopy(s, &current_active_state);
>> @@ -2832,6 +2833,15 @@ static void 
>> migration_completion(MigrationState *s)
>>       return;
>>
>>   fail:
>> +    if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +    } else if (ret) {
>> +        error_setg_errno(&local_err, -ret, "Error in migration 
>> completion");
>
> The 'ret = -1' case could be improved with error_setg(). As a followup.
>
> Thanks,
>
> C.
>
>
>
>
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +    }
>> +
>>       migration_completion_failed(s, current_active_state);
>>   }
>>
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
  2024-03-28 15:09   ` Peter Xu
@ 2024-03-28 15:54     ` Avihai Horon
  0 siblings, 0 replies; 11+ messages in thread
From: Avihai Horon @ 2024-03-28 15:54 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas, Cédric Le Goater


On 28/03/2024 17:09, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, Mar 28, 2024 at 04:02:51PM +0200, Avihai Horon wrote:
>> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
>> close_return_path_on_source()"), close_return_path_on_source() assumes
>> that migration error is set if an error occurs during migration.
>>
>> This may not be true if migration errors in migration_completion(). For
>> example, if qemu_savevm_state_complete_precopy() errors, migration error
>> will not be set.
>>
>> This in turn, will cause a migration hang bug, similar to the bug that
>> was fixed by commit 22b04245f0d5 ("migration: Join the return path
>> thread before releasing to_dst_file"), as shutdown() will not be issued
>> for the return-path channel.
>>
>> Fix it by ensuring migration error is set in case of error in
>> migration_completion().
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
> I'll attach this if it looks all right to you:
>
> Fixes: 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()")

Yes, sure, go ahead.

Thanks.

>
> Thanks,
>
>> ---
>>   migration/migration.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 9fe8fd2afd7..b73ae3a72c4 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
>>   {
>>       int ret = 0;
>>       int current_active_state = s->state;
>> +    Error *local_err = NULL;
>>
>>       if (s->state == MIGRATION_STATUS_ACTIVE) {
>>           ret = migration_completion_precopy(s, &current_active_state);
>> @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
>>       return;
>>
>>   fail:
>> +    if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +    } else if (ret) {
>> +        error_setg_errno(&local_err, -ret, "Error in migration completion");
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +    }
>> +
>>       migration_completion_failed(s, current_active_state);
>>   }
>>
>> --
>> 2.26.3
>>
>>
> --
> Peter Xu
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
  2024-03-28 15:50     ` Avihai Horon
@ 2024-03-28 16:29       ` Cédric Le Goater
  0 siblings, 0 replies; 11+ messages in thread
From: Cédric Le Goater @ 2024-03-28 16:29 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel; +Cc: Peter Xu, Fabiano Rosas

On 3/28/24 16:50, Avihai Horon wrote:
> 
> On 28/03/2024 17:21, Cédric Le Goater wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello Avihai,
>>
>> On 3/28/24 15:02, Avihai Horon wrote:
>>> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
>>> close_return_path_on_source()"), close_return_path_on_source() assumes
>>> that migration error is set if an error occurs during migration.
>>>
>>> This may not be true if migration errors in migration_completion(). For
>>> example, if qemu_savevm_state_complete_precopy() errors, migration error
>>> will not be set
>>
>> Out of curiosity, could you describe a bit more the context ? Did
>> vfio_save_complete_precopy() fail ? why ?
> 
> Yep, vfio_save_complete_precopy() failed (but it failed while I was experimenting with an unofficial debug FW).
> 
>>
>> We should propagate errors of .save_live_complete_precopy() handlers as
>> it was done .save_setup handlers(). For 9.1.
> 
> Agreed.
> 
>>
>>> This in turn, will cause a migration hang bug, similar to the bug that
>>> was fixed by commit 22b04245f0d5 ("migration: Join the return path
>>> thread before releasing to_dst_file"), as shutdown() will not be issued
>>> for the return-path channel.
>>
>> yes, but this test :
>>
>>     if (ret < 0) {
>>         goto fail;
>>     }
>>
>> will skip the close_return_path_on_source() call. Won't it ? So I don't
>> understand how it can be an issue. Am I missing something ?
> 
> It will skip the close_return_path_on_source() call in migration_completion(), but there is another close_return_path_on_source() call in migrate_fd_cleanup().

OK. Found it. This is a code path I hadn't explored yet.

Acked-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> 
>>
>>> Fix it by ensuring migration error is set in case of error in
>>> migration_completion().
>>
>> Why didn't you add a reference to commit 9425ef3f990a ?
> 
> I thought this commit didn't introduce this bug, but looking again in the mailing list [1], it kinda did:
> The hang bug was fully fixed by commit 22b04245f0d ("migration: Join the return path thread before releasing to_dst_file") and then 9425ef3f990a re-introduced the bug, but only for migration_completion() case.
> So, you are right, a fixes line with 9425ef3f990a should be added.
> 
> Thanks.
> 
> [1] https://lore.kernel.org/all/20240226203122.22894-1-farosas@suse.de/
> 
>>
>>
>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> ---
>>>   migration/migration.c | 10 ++++++++++
>>>   1 file changed, 10 insertions(+)
>>>
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 9fe8fd2afd7..b73ae3a72c4 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
>>>   {
>>>       int ret = 0;
>>>       int current_active_state = s->state;
>>> +    Error *local_err = NULL;
>>>
>>>       if (s->state == MIGRATION_STATUS_ACTIVE) {
>>>           ret = migration_completion_precopy(s, &current_active_state);
>>> @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
>>>       return;
>>>
>>>   fail:
>>> +    if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
>>> +        migrate_set_error(s, local_err);
>>> +        error_free(local_err);
>>> +    } else if (ret) {
>>> +        error_setg_errno(&local_err, -ret, "Error in migration completion");
>>
>> The 'ret = -1' case could be improved with error_setg(). As a followup.
>>
>> Thanks,
>>
>> C.
>>
>>
>>
>>
>>> +        migrate_set_error(s, local_err);
>>> +        error_free(local_err);
>>> +    }
>>> +
>>>       migration_completion_failed(s, current_active_state);
>>>   }
>>>
>>
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-9.0 0/2] migration: Two migration bug fixes
  2024-03-28 14:02 [PATCH for-9.0 0/2] migration: Two migration bug fixes Avihai Horon
  2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
  2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
@ 2024-03-28 17:04 ` Peter Xu
  2 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-03-28 17:04 UTC (permalink / raw)
  To: Avihai Horon; +Cc: qemu-devel, Fabiano Rosas, Cédric Le Goater

On Thu, Mar 28, 2024 at 04:02:50PM +0200, Avihai Horon wrote:
> Hello,
> 
> This small series fixes two migration bugs I stumbled upon recently.
> Comments are welcome, thanks for reviewing.
> 
> Avihai Horon (2):
>   migration: Set migration error in migration_completion()
>   migration/postcopy: Ensure postcopy_start() sets errp if it fails

queued for 9.0-rc2, thanks.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-03-28 17:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-28 14:02 [PATCH for-9.0 0/2] migration: Two migration bug fixes Avihai Horon
2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
2024-03-28 15:09   ` Peter Xu
2024-03-28 15:54     ` Avihai Horon
2024-03-28 15:21   ` Cédric Le Goater
2024-03-28 15:50     ` Avihai Horon
2024-03-28 16:29       ` Cédric Le Goater
2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
2024-03-28 15:04   ` Cédric Le Goater
2024-03-28 15:10   ` Peter Xu
2024-03-28 17:04 ` [PATCH for-9.0 0/2] migration: Two migration bug fixes Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).