[Question] VFIO migration will not be aborted in a corner scenario

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Question] VFIO migration will not be aborted in a corner scenario
@ 2025-08-11 16:02 Kunkun Jiang via
  2025-08-11 16:34 ` Cédric Le Goater
  0 siblings, 1 reply; 11+ messages in thread
From: Kunkun Jiang via @ 2025-08-11 16:02 UTC (permalink / raw)
  To: Alex Williamson, Yishai Hadas, clg
  Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu

Hi all,

While testing VFIO migration, I encountered an corner scenario case:
VFIO migration will not be aborted when the vfio device of dst-vm fails 
to transition from RESUMING to RUNNING state in vfio_vmstate_change.

I saw the comments in the vfio_vmstate_change but I don't understand why 
no action is taken for this situation. Allowing the live migration 
process to continue could cause unrecoverable damage to the VM. In this 
case, can we directly exit the dst-vm? Through the return-path 
mechanism, the src-vm can continue to run.

Looking forward to your reply.

Thanks,
Kunkun Jiang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Question] VFIO migration will not be aborted in a corner scenario
  2025-08-11 16:02 [Question] VFIO migration will not be aborted in a corner scenario Kunkun Jiang via
@ 2025-08-11 16:34 ` Cédric Le Goater
  2025-08-12 14:08   ` Avihai Horon
  0 siblings, 1 reply; 11+ messages in thread
From: Cédric Le Goater @ 2025-08-11 16:34 UTC (permalink / raw)
  To: Kunkun Jiang, Alex Williamson, Yishai Hadas
  Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu,
	'Avihai Horon'

Hello,

+ Avihai

On 8/11/25 18:02, Kunkun Jiang wrote:
> Hi all,
> 
> While testing VFIO migration, I encountered an corner scenario case:
> VFIO migration will not be aborted when the vfio device of dst-vm fails to transition from RESUMING to RUNNING state in vfio_vmstate_change.
> 
> I saw the comments in the vfio_vmstate_change but I don't understand why no action is taken for this situation. 

There is error handling in vfio_vmstate_change() :

         /*
          * Migration should be aborted in this case, but vm_state_notify()
          * currently does not support reporting failures.
          */
         migration_file_set_error(ret, local_err);

> Allowing the live migration process to continue could cause unrecoverable damage to the VM. In this case, can we directly exit the dst-vm? Through the return-path mechanism, the src-vm can continue to run.
> 
> Looking forward to your reply.

I suggest you open an issue on :

   https://gitlab.com/qemu-project/qemu/-/issues/

with a detailed description of your environment :
   
   Host HW, Host OS, QEMU version, QEMU command line, Guest OS, etc.

A template is provided when a new issue is created.


Thanks,

C.
  




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Question] VFIO migration will not be aborted in a corner scenario
  2025-08-11 16:34 ` Cédric Le Goater
@ 2025-08-12 14:08   ` Avihai Horon
  2025-08-12 14:34     ` Cédric Le Goater
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Avihai Horon @ 2025-08-12 14:08 UTC (permalink / raw)
  To: Cédric Le Goater, Kunkun Jiang, Alex Williamson,
	Yishai Hadas
  Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu


On 11/08/2025 19:34, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> Hello,
>
> + Avihai
>
> On 8/11/25 18:02, Kunkun Jiang wrote:
>> Hi all,
>>
>> While testing VFIO migration, I encountered an corner scenario case:
>> VFIO migration will not be aborted when the vfio device of dst-vm 
>> fails to transition from RESUMING to RUNNING state in 
>> vfio_vmstate_change.
>>
>> I saw the comments in the vfio_vmstate_change but I don't understand 
>> why no action is taken for this situation.
>
> There is error handling in vfio_vmstate_change() :
>
>         /*
>          * Migration should be aborted in this case, but 
> vm_state_notify()
>          * currently does not support reporting failures.
>          */
>         migration_file_set_error(ret, local_err);

Hmm, I think this only sets the error on src. On dst we don't have 
MigrationState->to_dst_file, so we end up just reporting the error.
But even if we did set it, no one is checking if there is a migration 
error after vm_start() is called in process_incoming_migration_bh().

>
>> Allowing the live migration process to continue could cause 
>> unrecoverable damage to the VM.

What do you mean by unrecoverable damage to the VM?
If RESUMING->RUNNING transition fails, would a VFIO reset recover the 
device and allow the VM to continue operation with damage limited only 
to the VFIO device?

>> In this case, can we directly exit the dst-vm? Through the 
>> return-path mechanism, the src-vm can continue to run.
>>
>> Looking forward to your reply.
>
The straightforward solution, as you suggested, is to exit dst upon 
error in RESUMING->RUNNING transition and notify about it to src through 
the return-path.
However, I am not sure if failing the migration after vm_start() on dst 
is a bit late (as we start vCPUs and do migration_block_activate, etc.).

But I can think of another way to solve this, hopefully simpler.
According to VFIO migration uAPI [1]:
  * RESUMING -> STOP
  *   Leaving RESUMING terminates a data transfer session and indicates the
  *   device should complete processing of the data delivered by 
write(). The
  *   kernel migration driver should complete the incorporation of data 
written
  *   to the data transfer FD into the device internal state and perform
  *   final validity and consistency checking of the new device state. 
If the
  *   user provided data is found to be incomplete, inconsistent, or 
otherwise
  *   invalid, the migration driver must fail the SET_STATE ioctl and
  *   optionally go to the ERROR state as described below.

So, IIUC, we can add an explicit RESUMING->STOP transition [2] after the 
device config is loaded (which is the last data the device is expected 
to receive).
If this transition fails, it means something was wrong with migration, 
and we can send src an error msg via return-path (and not continue to 
vm_start()).

Maybe this approach is less complicated than the first one, and it will 
also work if src VM was paused prior migration.
I already tested some POC and it seems to be working (at least with an 
artificial error i injected in RESUMING->STOP transition).
Kunkun, can you apply the following diff [3] and check if this solves 
the issue?

And in general, what do you think? Should we go with this approach or do 
you have other ideas?

Thanks.

[1] 
https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099
[2] Today RESUMING->STOP is done implicitly by the VFIO driver as part 
of RESUMING->RUNNING transition.
[3]

diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index e4785031a7..66f8461f02 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -267,6 +267,12 @@ static bool 
vfio_load_bufs_thread_load_config(VFIODevice *vbasedev,
      ret = vfio_load_device_config_state(f_in, vbasedev);
      bql_unlock();

+    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
+                                   VFIO_DEVICE_STATE_ERROR, errp);
+    if (ret) {
+        return false;
+    }
+
      if (ret < 0) {
          error_setg(errp, "%s: vfio_load_device_config_state() failed: %d",
                     vbasedev->name, ret);

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 4c06e3db93..a707d17a5b 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -737,6 +737,8 @@ static int vfio_load_state(QEMUFile *f, void 
*opaque, int version_id)
          switch (data) {
          case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
          {
+            Error *local_err = NULL;
+
              if (vfio_multifd_transfer_enabled(vbasedev)) {
                  error_report("%s: got DEV_CONFIG_STATE in main migration "
                               "channel but doing multifd transfer",
@@ -744,7 +746,19 @@ static int vfio_load_state(QEMUFile *f, void 
*opaque, int version_id)
                  return -EINVAL;
              }

-            return vfio_load_device_config_state(f, opaque);
+            ret = vfio_load_device_config_state(f, opaque);
+            if (ret) {
+                return ret;
+            }
+
+            ret = vfio_migration_set_state_or_reset(
+                vbasedev, VFIO_DEVICE_STATE_STOP, &local_err);
+            if (ret) {
+                error_report_err(local_err);
+                return ret;
+            }
+
+            return 0;
          }
          case VFIO_MIG_FLAG_DEV_SETUP_STATE:
          {

diff --git a/migration/migration.c b/migration/migration.c
index 10c216d25d..fd498c864d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -91,6 +91,7 @@ enum mig_rp_message_type {
      MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
      MIG_RP_MSG_RESUME_ACK,   /* tell source that we are ready to resume */
      MIG_RP_MSG_SWITCHOVER_ACK, /* Tell source it's OK to do switchover */
+    MIG_RP_MSG_ERROR, /* Tell source that destination encountered an 
error */

      MIG_RP_MSG_MAX
  };
@@ -884,6 +885,11 @@ process_incoming_migration_co(void *opaque)
      ret = qemu_loadvm_state(mis->from_src_file);
      mis->loadvm_co = NULL;

+    if (ret) {
+        migrate_send_rp_error(mis);
+        error_report("SENT RP ERROR");
+    }
+
  trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");

      ps = postcopy_state_get();
@@ -1126,6 +1132,11 @@ bool migration_has_all_channels(void)
      return true;
  }
+int migrate_send_rp_error(MigrationIncomingState *mis)
+{
+    return migrate_send_rp_message(mis, MIG_RP_MSG_ERROR, 0, NULL);
+}
+
  int migrate_send_rp_switchover_ack(MigrationIncomingState *mis)
  {
      return migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, 
NULL);
@@ -2614,6 +2625,10 @@ static void *source_return_path_thread(void *opaque)
              trace_source_return_path_thread_switchover_acked();
              break;

+        case MIG_RP_MSG_ERROR:
+            error_setg(&err, "DST indicated error");
+            goto out;
+
          default:
              break;
          }
diff --git a/migration/migration.h b/migration/migration.h
index 01329bf824..f11ff7a199 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -553,6 +553,7 @@ void 
migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
                                   char *block_name);
  void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t 
value);
  int migrate_send_rp_switchover_ack(MigrationIncomingState *mis);
+int migrate_send_rp_error(MigrationIncomingState *mis);

  void dirty_bitmap_mig_before_vm_start(void);
  void dirty_bitmap_mig_cancel_outgoing(void);

> I suggest you open an issue on :
>
>   https://gitlab.com/qemu-project/qemu/-/issues/
>
> with a detailed description of your environment :
>
>   Host HW, Host OS, QEMU version, QEMU command line, Guest OS, etc.
>
> A template is provided when a new issue is created.
>
>
> Thanks,
>
> C.
>
>
>


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Question] VFIO migration will not be aborted in a corner scenario
  2025-08-12 14:08   ` Avihai Horon
@ 2025-08-12 14:34     ` Cédric Le Goater
  2025-08-12 14:58       ` Peter Xu
  2025-08-12 14:56     ` Cédric Le Goater
  2025-08-13 12:18     ` Kunkun Jiang via
  2 siblings, 1 reply; 11+ messages in thread
From: Cédric Le Goater @ 2025-08-12 14:34 UTC (permalink / raw)
  To: Avihai Horon, Kunkun Jiang, Alex Williamson, Yishai Hadas
  Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu,
	'Peter Xu', Fabiano Rosas

+peter
+fabiano

On 8/12/25 16:08, Avihai Horon wrote:
> 
> On 11/08/2025 19:34, Cédric Le Goater wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello,
>>
>> + Avihai
>>
>> On 8/11/25 18:02, Kunkun Jiang wrote:
>>> Hi all,
>>>
>>> While testing VFIO migration, I encountered an corner scenario case:
>>> VFIO migration will not be aborted when the vfio device of dst-vm fails to transition from RESUMING to RUNNING state in vfio_vmstate_change.
>>>
>>> I saw the comments in the vfio_vmstate_change but I don't understand why no action is taken for this situation.
>>
>> There is error handling in vfio_vmstate_change() :
>>
>>         /*
>>          * Migration should be aborted in this case, but vm_state_notify()
>>          * currently does not support reporting failures.
>>          */
>>         migration_file_set_error(ret, local_err);
> 
> Hmm, I think this only sets the error on src. On dst we don't have MigrationState->to_dst_file, so we end up just reporting the error.
> But even if we did set it, no one is checking if there is a migration error after vm_start() is called in process_incoming_migration_bh().
> 
>>
>>> Allowing the live migration process to continue could cause unrecoverable damage to the VM.
> 
> What do you mean by unrecoverable damage to the VM?
> If RESUMING->RUNNING transition fails, would a VFIO reset recover the device and allow the VM to continue operation with damage limited only to the VFIO device?
> 
>>> In this case, can we directly exit the dst-vm? Through the return-path mechanism, the src-vm can continue to run.
>>>
>>> Looking forward to your reply.
>>
> The straightforward solution, as you suggested, is to exit dst upon error in RESUMING->RUNNING transition and notify about it to src through the return-path.
> However, I am not sure if failing the migration after vm_start() on dst is a bit late (as we start vCPUs and do migration_block_activate, etc.).
> 
> But I can think of another way to solve this, hopefully simpler.
> According to VFIO migration uAPI [1]:
>   * RESUMING -> STOP
>   *   Leaving RESUMING terminates a data transfer session and indicates the
>   *   device should complete processing of the data delivered by write(). The
>   *   kernel migration driver should complete the incorporation of data written
>   *   to the data transfer FD into the device internal state and perform
>   *   final validity and consistency checking of the new device state. If the
>   *   user provided data is found to be incomplete, inconsistent, or otherwise
>   *   invalid, the migration driver must fail the SET_STATE ioctl and
>   *   optionally go to the ERROR state as described below.
> 
> So, IIUC, we can add an explicit RESUMING->STOP transition [2] after the device config is loaded (which is the last data the device is expected to receive).
> If this transition fails, it means something was wrong with migration, and we can send src an error msg via return-path (and not continue to vm_start()).
> 
> Maybe this approach is less complicated than the first one, and it will also work if src VM was paused prior migration.
> I already tested some POC and it seems to be working (at least with an artificial error i injected in RESUMING->STOP transition).
> Kunkun, can you apply the following diff [3] and check if this solves the issue?
> 
> And in general, what do you think? Should we go with this approach or do you have other ideas?
> 
> Thanks.
> 
> [1] https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099
> [2] Today RESUMING->STOP is done implicitly by the VFIO driver as part of RESUMING->RUNNING transition.
> [3]
> 
> diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
> index e4785031a7..66f8461f02 100644
> --- a/hw/vfio/migration-multifd.c
> +++ b/hw/vfio/migration-multifd.c
> @@ -267,6 +267,12 @@ static bool vfio_load_bufs_thread_load_config(VFIODevice *vbasedev,
>       ret = vfio_load_device_config_state(f_in, vbasedev);
>       bql_unlock();
> 
> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
> +                                   VFIO_DEVICE_STATE_ERROR, errp);
> +    if (ret) {
> +        return false;
> +    }
> +
>       if (ret < 0) {
>           error_setg(errp, "%s: vfio_load_device_config_state() failed: %d",
>                      vbasedev->name, ret);
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 4c06e3db93..a707d17a5b 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -737,6 +737,8 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>           switch (data) {
>           case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
>           {
> +            Error *local_err = NULL;
> +
>               if (vfio_multifd_transfer_enabled(vbasedev)) {
>                   error_report("%s: got DEV_CONFIG_STATE in main migration "
>                                "channel but doing multifd transfer",
> @@ -744,7 +746,19 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>                   return -EINVAL;
>               }
> 
> -            return vfio_load_device_config_state(f, opaque);
> +            ret = vfio_load_device_config_state(f, opaque);
> +            if (ret) {
> +                return ret;
> +            }
> +
> +            ret = vfio_migration_set_state_or_reset(
> +                vbasedev, VFIO_DEVICE_STATE_STOP, &local_err);
> +            if (ret) {
> +                error_report_err(local_err);
> +                return ret;
> +            }
> +
> +            return 0;
>           }
>           case VFIO_MIG_FLAG_DEV_SETUP_STATE:
>           {
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 10c216d25d..fd498c864d 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -91,6 +91,7 @@ enum mig_rp_message_type {
>       MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
>       MIG_RP_MSG_RESUME_ACK,   /* tell source that we are ready to resume */
>       MIG_RP_MSG_SWITCHOVER_ACK, /* Tell source it's OK to do switchover */
> +    MIG_RP_MSG_ERROR, /* Tell source that destination encountered an error */
> 
>       MIG_RP_MSG_MAX
>   };
> @@ -884,6 +885,11 @@ process_incoming_migration_co(void *opaque)
>       ret = qemu_loadvm_state(mis->from_src_file);
>       mis->loadvm_co = NULL;
> 
> +    if (ret) {
> +        migrate_send_rp_error(mis);
> +        error_report("SENT RP ERROR");
> +    }
> +
>   trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
> 
>       ps = postcopy_state_get();
> @@ -1126,6 +1132,11 @@ bool migration_has_all_channels(void)
>       return true;
>   }
> +int migrate_send_rp_error(MigrationIncomingState *mis)
> +{
> +    return migrate_send_rp_message(mis, MIG_RP_MSG_ERROR, 0, NULL);
> +}
> +
>   int migrate_send_rp_switchover_ack(MigrationIncomingState *mis)
>   {
>       return migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, NULL);
> @@ -2614,6 +2625,10 @@ static void *source_return_path_thread(void *opaque)
>               trace_source_return_path_thread_switchover_acked();
>               break;
> 
> +        case MIG_RP_MSG_ERROR:
> +            error_setg(&err, "DST indicated error");
> +            goto out;
> +
>           default:
>               break;
>           }
> diff --git a/migration/migration.h b/migration/migration.h
> index 01329bf824..f11ff7a199 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -553,6 +553,7 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>                                    char *block_name);
>   void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
>   int migrate_send_rp_switchover_ack(MigrationIncomingState *mis);
> +int migrate_send_rp_error(MigrationIncomingState *mis);
> 
>   void dirty_bitmap_mig_before_vm_start(void);
>   void dirty_bitmap_mig_cancel_outgoing(void);
> 
>> I suggest you open an issue on :
>>
>>   https://gitlab.com/qemu-project/qemu/-/issues/
>>
>> with a detailed description of your environment :
>>
>>   Host HW, Host OS, QEMU version, QEMU command line, Guest OS, etc.
>>
>> A template is provided when a new issue is created.
>>
>>
>> Thanks,
>>
>> C.
>>
>>
>>
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Question] VFIO migration will not be aborted in a corner scenario
  2025-08-12 14:08   ` Avihai Horon
  2025-08-12 14:34     ` Cédric Le Goater
@ 2025-08-12 14:56     ` Cédric Le Goater
  2025-08-13 12:18       ` Kunkun Jiang via
  2025-08-18  6:13       ` Avihai Horon
  2025-08-13 12:18     ` Kunkun Jiang via
  2 siblings, 2 replies; 11+ messages in thread
From: Cédric Le Goater @ 2025-08-12 14:56 UTC (permalink / raw)
  To: Avihai Horon, Kunkun Jiang, Alex Williamson, Yishai Hadas
  Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu

On 8/12/25 16:08, Avihai Horon wrote:
> 
> On 11/08/2025 19:34, Cédric Le Goater wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello,
>>
>> + Avihai
>>
>> On 8/11/25 18:02, Kunkun Jiang wrote:
>>> Hi all,
>>>
>>> While testing VFIO migration, I encountered an corner scenario case:
>>> VFIO migration will not be aborted when the vfio device of dst-vm fails to transition from RESUMING to RUNNING state in vfio_vmstate_change.
>>>
>>> I saw the comments in the vfio_vmstate_change but I don't understand why no action is taken for this situation.
>>
>> There is error handling in vfio_vmstate_change() :
>>
>>         /*
>>          * Migration should be aborted in this case, but vm_state_notify()
>>          * currently does not support reporting failures.
>>          */
>>         migration_file_set_error(ret, local_err);
> 
> Hmm, I think this only sets the error on src. On dst we don't have MigrationState->to_dst_file, so we end up just reporting the error.
> But even if we did set it, no one is checking if there is a migration error after vm_start() is called in process_incoming_migration_bh().
> 
>>
>>> Allowing the live migration process to continue could cause unrecoverable damage to the VM.
> 
> What do you mean by unrecoverable damage to the VM?
> If RESUMING->RUNNING transition fails, would a VFIO reset recover the device and allow the VM to continue operation with damage limited only to the VFIO device?
> 
>>> In this case, can we directly exit the dst-vm? Through the return-path mechanism, the src-vm can continue to run.
>>>
>>> Looking forward to your reply.
>>
> The straightforward solution, as you suggested, is to exit dst upon error in RESUMING->RUNNING transition and notify about it to src through the return-path.
> However, I am not sure if failing the migration after vm_start() on dst is a bit late (as we start vCPUs and do migration_block_activate, etc.).
> 
> But I can think of another way to solve this, hopefully simpler.
> According to VFIO migration uAPI [1]:
>   * RESUMING -> STOP
>   *   Leaving RESUMING terminates a data transfer session and indicates the
>   *   device should complete processing of the data delivered by write(). The
>   *   kernel migration driver should complete the incorporation of data written
>   *   to the data transfer FD into the device internal state and perform
>   *   final validity and consistency checking of the new device state. If the
>   *   user provided data is found to be incomplete, inconsistent, or otherwise
>   *   invalid, the migration driver must fail the SET_STATE ioctl and
>   *   optionally go to the ERROR state as described below.
> 
> So, IIUC, we can add an explicit RESUMING->STOP transition [2] after the device config is loaded (which is the last data the device is expected to receive).
> If this transition fails, it means something was wrong with migration, and we can send src an error msg via return-path (and not continue to vm_start()).
> 
> Maybe this approach is less complicated than the first one, and it will also work if src VM was paused prior migration.
> I already tested some POC and it seems to be working (at least with an artificial error i injected in RESUMING->STOP transition).
> Kunkun, can you apply the following diff [3] and check if this solves the issue?
> 
> And in general, what do you think? Should we go with this approach or do you have other ideas?
> 
> Thanks.
> 
> [1] https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099
> [2] Today RESUMING->STOP is done implicitly by the VFIO driver as part of RESUMING->RUNNING transition.
> [3]

Avihai,

Could you please send an RFC patch with Peter and Fabiano in cc: ?
This will help to discuss the proposal and keep track of the issue.


Kunkun Jiang,

Could you please share details on your environment ?

Thanks,

C.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Question] VFIO migration will not be aborted in a corner scenario
  2025-08-12 14:34     ` Cédric Le Goater
@ 2025-08-12 14:58       ` Peter Xu
  2025-08-18  6:10         ` Avihai Horon
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Xu @ 2025-08-12 14:58 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Avihai Horon, Kunkun Jiang, Alex Williamson, Yishai Hadas,
	open list:All patches CC here, wanghaibin.wang, Zenghui Yu,
	Fabiano Rosas

On Tue, Aug 12, 2025 at 04:34:34PM +0200, Cédric Le Goater wrote:
> +peter
> +fabiano
> 
> On 8/12/25 16:08, Avihai Horon wrote:
> > 
> > On 11/08/2025 19:34, Cédric Le Goater wrote:
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > Hello,
> > > 
> > > + Avihai
> > > 
> > > On 8/11/25 18:02, Kunkun Jiang wrote:
> > > > Hi all,
> > > > 
> > > > While testing VFIO migration, I encountered an corner scenario case:
> > > > VFIO migration will not be aborted when the vfio device of dst-vm fails to transition from RESUMING to RUNNING state in vfio_vmstate_change.
> > > > 
> > > > I saw the comments in the vfio_vmstate_change but I don't understand why no action is taken for this situation.
> > > 
> > > There is error handling in vfio_vmstate_change() :
> > > 
> > >         /*
> > >          * Migration should be aborted in this case, but vm_state_notify()
> > >          * currently does not support reporting failures.
> > >          */
> > >         migration_file_set_error(ret, local_err);
> > 
> > Hmm, I think this only sets the error on src. On dst we don't have MigrationState->to_dst_file, so we end up just reporting the error.
> > But even if we did set it, no one is checking if there is a migration error after vm_start() is called in process_incoming_migration_bh().
> > 
> > > 
> > > > Allowing the live migration process to continue could cause unrecoverable damage to the VM.
> > 
> > What do you mean by unrecoverable damage to the VM?
> > If RESUMING->RUNNING transition fails, would a VFIO reset recover the device and allow the VM to continue operation with damage limited only to the VFIO device?
> > 
> > > > In this case, can we directly exit the dst-vm? Through the return-path mechanism, the src-vm can continue to run.
> > > > 
> > > > Looking forward to your reply.
> > > 
> > The straightforward solution, as you suggested, is to exit dst upon error in RESUMING->RUNNING transition and notify about it to src through the return-path.
> > However, I am not sure if failing the migration after vm_start() on dst is a bit late (as we start vCPUs and do migration_block_activate, etc.).
> > 
> > But I can think of another way to solve this, hopefully simpler.
> > According to VFIO migration uAPI [1]:
> >   * RESUMING -> STOP
> >   *   Leaving RESUMING terminates a data transfer session and indicates the
> >   *   device should complete processing of the data delivered by write(). The
> >   *   kernel migration driver should complete the incorporation of data written
> >   *   to the data transfer FD into the device internal state and perform
> >   *   final validity and consistency checking of the new device state. If the
> >   *   user provided data is found to be incomplete, inconsistent, or otherwise
> >   *   invalid, the migration driver must fail the SET_STATE ioctl and
> >   *   optionally go to the ERROR state as described below.
> > 
> > So, IIUC, we can add an explicit RESUMING->STOP transition [2] after the device config is loaded (which is the last data the device is expected to receive).
> > If this transition fails, it means something was wrong with migration, and we can send src an error msg via return-path (and not continue to vm_start()).
> > 
> > Maybe this approach is less complicated than the first one, and it will also work if src VM was paused prior migration.
> > I already tested some POC and it seems to be working (at least with an artificial error i injected in RESUMING->STOP transition).
> > Kunkun, can you apply the following diff [3] and check if this solves the issue?
> > 
> > And in general, what do you think? Should we go with this approach or do you have other ideas?
> > 
> > Thanks.
> > 
> > [1] https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099
> > [2] Today RESUMING->STOP is done implicitly by the VFIO driver as part of RESUMING->RUNNING transition.
> > [3]
> > 
> > diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
> > index e4785031a7..66f8461f02 100644
> > --- a/hw/vfio/migration-multifd.c
> > +++ b/hw/vfio/migration-multifd.c
> > @@ -267,6 +267,12 @@ static bool vfio_load_bufs_thread_load_config(VFIODevice *vbasedev,
> >       ret = vfio_load_device_config_state(f_in, vbasedev);
> >       bql_unlock();
> > 
> > +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
> > +                                   VFIO_DEVICE_STATE_ERROR, errp);
> > +    if (ret) {
> > +        return false;
> > +    }
> > +
> >       if (ret < 0) {
> >           error_setg(errp, "%s: vfio_load_device_config_state() failed: %d",
> >                      vbasedev->name, ret);
> > 
> > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> > index 4c06e3db93..a707d17a5b 100644
> > --- a/hw/vfio/migration.c
> > +++ b/hw/vfio/migration.c
> > @@ -737,6 +737,8 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
> >           switch (data) {
> >           case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
> >           {
> > +            Error *local_err = NULL;
> > +
> >               if (vfio_multifd_transfer_enabled(vbasedev)) {
> >                   error_report("%s: got DEV_CONFIG_STATE in main migration "
> >                                "channel but doing multifd transfer",
> > @@ -744,7 +746,19 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
> >                   return -EINVAL;
> >               }
> > 
> > -            return vfio_load_device_config_state(f, opaque);
> > +            ret = vfio_load_device_config_state(f, opaque);
> > +            if (ret) {
> > +                return ret;
> > +            }
> > +
> > +            ret = vfio_migration_set_state_or_reset(
> > +                vbasedev, VFIO_DEVICE_STATE_STOP, &local_err);
> > +            if (ret) {
> > +                error_report_err(local_err);
> > +                return ret;
> > +            }
> > +
> > +            return 0;
> >           }
> >           case VFIO_MIG_FLAG_DEV_SETUP_STATE:
> >           {
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 10c216d25d..fd498c864d 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -91,6 +91,7 @@ enum mig_rp_message_type {
> >       MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
> >       MIG_RP_MSG_RESUME_ACK,   /* tell source that we are ready to resume */
> >       MIG_RP_MSG_SWITCHOVER_ACK, /* Tell source it's OK to do switchover */
> > +    MIG_RP_MSG_ERROR, /* Tell source that destination encountered an error */
> > 
> >       MIG_RP_MSG_MAX
> >   };
> > @@ -884,6 +885,11 @@ process_incoming_migration_co(void *opaque)
> >       ret = qemu_loadvm_state(mis->from_src_file);
> >       mis->loadvm_co = NULL;
> > 
> > +    if (ret) {
> > +        migrate_send_rp_error(mis);
> > +        error_report("SENT RP ERROR");
> > +    }
> > +
> >   trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
> > 
> >       ps = postcopy_state_get();
> > @@ -1126,6 +1132,11 @@ bool migration_has_all_channels(void)
> >       return true;
> >   }
> > +int migrate_send_rp_error(MigrationIncomingState *mis)
> > +{
> > +    return migrate_send_rp_message(mis, MIG_RP_MSG_ERROR, 0, NULL);
> > +}
> > +
> >   int migrate_send_rp_switchover_ack(MigrationIncomingState *mis)
> >   {
> >       return migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, NULL);
> > @@ -2614,6 +2625,10 @@ static void *source_return_path_thread(void *opaque)
> >               trace_source_return_path_thread_switchover_acked();
> >               break;
> > 
> > +        case MIG_RP_MSG_ERROR:
> > +            error_setg(&err, "DST indicated error");
> > +            goto out;

If this is only a boolean, we can reuse RP_SHUT.  Likely we could pass in
an error to migration_incoming_state_destroy():

diff --git a/migration/migration.c b/migration/migration.c
index 42a2a6e8f2..2ebba7838a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -441,7 +441,7 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
     }
 }
 
-void migration_incoming_state_destroy(void)
+void migration_incoming_state_destroy(bool has_error)
 {
     struct MigrationIncomingState *mis = migration_incoming_get_current();
 
@@ -466,8 +466,11 @@ void migration_incoming_state_destroy(void)
     qemu_loadvm_state_cleanup(mis);
 
     if (mis->to_src_file) {
-        /* Tell source that we are done */
-        migrate_send_rp_shut(mis, qemu_file_get_error(mis->from_src_file) != 0);
+        /* Tell source whether load succeeded */
+        if (!has_error) {
+            has_error = qemu_file_get_error(mis->from_src_file) != 0;
+        }
+        migrate_send_rp_shut(mis, has_error);
         qemu_fclose(mis->to_src_file);
         mis->to_src_file = NULL;
     }

Maybe it'll even work as late as process_incoming_migration_bh(), where
vm_start() could fail - right now it couldn't, but if there'll be an error
message reported upward then logically it can also set has_error=1 for the
RP_SHUT message.  Src QEMU relies on RP_SHUT message and retval=0 to quit
src QEMU, otherwise QEMU should fail the migration and restart VM on src.

> > +
> >           default:
> >               break;
> >           }
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 01329bf824..f11ff7a199 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -553,6 +553,7 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
> >                                    char *block_name);
> >   void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
> >   int migrate_send_rp_switchover_ack(MigrationIncomingState *mis);
> > +int migrate_send_rp_error(MigrationIncomingState *mis);
> > 
> >   void dirty_bitmap_mig_before_vm_start(void);
> >   void dirty_bitmap_mig_cancel_outgoing(void);
> > 
> > > I suggest you open an issue on :
> > > 
> > >   https://gitlab.com/qemu-project/qemu/-/issues/
> > > 
> > > with a detailed description of your environment :
> > > 
> > >   Host HW, Host OS, QEMU version, QEMU command line, Guest OS, etc.
> > > 
> > > A template is provided when a new issue is created.
> > > 
> > > 
> > > Thanks,
> > > 
> > > C.
> > > 
> > > 
> > > 
> > 
> 

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Question] VFIO migration will not be aborted in a corner scenario
  2025-08-12 14:08   ` Avihai Horon
  2025-08-12 14:34     ` Cédric Le Goater
  2025-08-12 14:56     ` Cédric Le Goater
@ 2025-08-13 12:18     ` Kunkun Jiang via
  2025-08-18  6:44       ` Avihai Horon
  2 siblings, 1 reply; 11+ messages in thread
From: Kunkun Jiang via @ 2025-08-13 12:18 UTC (permalink / raw)
  To: Avihai Horon, Cédric Le Goater, Alex Williamson,
	Yishai Hadas
  Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu

Hi Avihai,

On 2025/8/12 22:08, Avihai Horon wrote:
> 
> On 11/08/2025 19:34, Cédric Le Goater wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello,
>>
>> + Avihai
>>
>> On 8/11/25 18:02, Kunkun Jiang wrote:
>>> Hi all,
>>>
>>> While testing VFIO migration, I encountered an corner scenario case:
>>> VFIO migration will not be aborted when the vfio device of dst-vm 
>>> fails to transition from RESUMING to RUNNING state in 
>>> vfio_vmstate_change.
>>>
>>> I saw the comments in the vfio_vmstate_change but I don't understand 
>>> why no action is taken for this situation.
>>
>> There is error handling in vfio_vmstate_change() :
>>
>>         /*
>>          * Migration should be aborted in this case, but 
>> vm_state_notify()
>>          * currently does not support reporting failures.
>>          */
>>         migration_file_set_error(ret, local_err);
> 
> Hmm, I think this only sets the error on src. On dst we don't have 
> MigrationState->to_dst_file, so we end up just reporting the erro > But even if we did set it, no one is checking if there is a migration
> error after vm_start() is called in process_incoming_migration_bh().
Yes, that's what I mean.
> 
>>
>>> Allowing the live migration process to continue could cause 
>>> unrecoverable damage to the VM.
> 
> What do you mean by unrecoverable damage to the VM?
> If RESUMING->RUNNING transition fails, would a VFIO reset recover the 
> device and allow the VM to continue operation with damage limited only 
> to the VFIO device?
I didn't express myself clearly, let me explain again.
In my test, I ran a task inside the VM that was using this vfio device.
In this corner scenario, the task is aborted immediately and the VM is 
still running normally. I mean it is unacceptable that the task is 
aborted directly.
> 
>>> In this case, can we directly exit the dst-vm? Through the 
>>> return-path mechanism, the src-vm can continue to run.
>>>
>>> Looking forward to your reply.
>>
> The straightforward solution, as you suggested, is to exit dst upon 
> error in RESUMING->RUNNING transition and notify about it to src through 
> the return-path.
> However, I am not sure if failing the migration after vm_start() on dst 
> is a bit late (as we start vCPUs and do migration_block_activate, etc.).
I think vCPUs have not started yet. vCPUs will be started only after the 
all callbacks of vm_state_notify are executed.
If there are any problems with my analysis, please point them out.
> 
> But I can think of another way to solve this, hopefully simpler.
> According to VFIO migration uAPI [1]:
>   * RESUMING -> STOP
>   *   Leaving RESUMING terminates a data transfer session and indicates the
>   *   device should complete processing of the data delivered by 
> write(). The
>   *   kernel migration driver should complete the incorporation of data 
> written
>   *   to the data transfer FD into the device internal state and perform
>   *   final validity and consistency checking of the new device state. 
> If the
>   *   user provided data is found to be incomplete, inconsistent, or 
> otherwise
>   *   invalid, the migration driver must fail the SET_STATE ioctl and
>   *   optionally go to the ERROR state as described below.
> 
> So, IIUC, we can add an explicit RESUMING->STOP transition [2] after the 
> device config is loaded (which is the last data the device is expected 
> to receive).
> If this transition fails, it means something was wrong with migration, 
> and we can send src an error msg via return-path (and not continue to 
> vm_start()).
> 
> Maybe this approach is less complicated than the first one, and it will 
> also work if src VM was paused prior migration.
> I already tested some POC and it seems to be working (at least with an 
> artificial error i injected in RESUMING->STOP transition).
> Kunkun, can you apply the following diff [3] and check if this solves 
> the issue?
Ok, I'll try it.
> 
> And in general, what do you think? Should we go with this approach or do 
> you have other ideas?
My current approach is rather crude. As I said above, I think vcpu has 
not started yet, so I changed it like this.
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 4c06e3db93..70ccb706c6 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -878,7 +878,7 @@ static void vfio_vmstate_change_prepare(void 
*opaque, bool running,
  static void vfio_vmstate_change(void *opaque, bool running, RunState 
state)
  {
      VFIODevice *vbasedev = opaque;
-    enum vfio_device_mig_state new_state;
+    enum vfio_device_mig_state new_state, pre_state;
      Error *local_err = NULL;
      int ret;

@@ -899,6 +899,10 @@ static void vfio_vmstate_change(void *opaque, bool 
running, RunState state)
           * currently does not support reporting failures.
           */
          migration_file_set_error(ret, local_err);
+
+        if (pre_state == VFIO_DEVICE_STATE_RESUMING) {
+            exit(EXIT_FAILURE);
+        }
      }
> 
> Thanks.
> 
> [1] 
> https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099 
> 
> [2] Today RESUMING->STOP is done implicitly by the VFIO driver as part 
> of RESUMING->RUNNING transition.
> [3]
> 
> diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
> index e4785031a7..66f8461f02 100644
> --- a/hw/vfio/migration-multifd.c
> +++ b/hw/vfio/migration-multifd.c
> @@ -267,6 +267,12 @@ static bool 
> vfio_load_bufs_thread_load_config(VFIODevice *vbasedev,
>       ret = vfio_load_device_config_state(f_in, vbasedev);
>       bql_unlock();
> 
> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
> +                                   VFIO_DEVICE_STATE_ERROR, errp);
> +    if (ret) {
> +        return false;
> +    }
> +
>       if (ret < 0) {
>           error_setg(errp, "%s: vfio_load_device_config_state() failed: 
> %d",
>                      vbasedev->name, ret);
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 4c06e3db93..a707d17a5b 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -737,6 +737,8 @@ static int vfio_load_state(QEMUFile *f, void 
> *opaque, int version_id)
>           switch (data) {
>           case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
>           {
> +            Error *local_err = NULL;
> +
>               if (vfio_multifd_transfer_enabled(vbasedev)) {
>                   error_report("%s: got DEV_CONFIG_STATE in main 
> migration "
>                                "channel but doing multifd transfer",
> @@ -744,7 +746,19 @@ static int vfio_load_state(QEMUFile *f, void 
> *opaque, int version_id)
>                   return -EINVAL;
>               }
> 
> -            return vfio_load_device_config_state(f, opaque);
> +            ret = vfio_load_device_config_state(f, opaque);
> +            if (ret) {
> +                return ret;
> +            }
> +
> +            ret = vfio_migration_set_state_or_reset(
> +                vbasedev, VFIO_DEVICE_STATE_STOP, &local_err);
> +            if (ret) {
> +                error_report_err(local_err);
> +                return ret;
> +            }
> +
> +            return 0;
>           }
>           case VFIO_MIG_FLAG_DEV_SETUP_STATE:
>           {
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 10c216d25d..fd498c864d 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -91,6 +91,7 @@ enum mig_rp_message_type {
>       MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
>       MIG_RP_MSG_RESUME_ACK,   /* tell source that we are ready to 
> resume */
>       MIG_RP_MSG_SWITCHOVER_ACK, /* Tell source it's OK to do switchover */
> +    MIG_RP_MSG_ERROR, /* Tell source that destination encountered an 
> error */
> 
>       MIG_RP_MSG_MAX
>   };
> @@ -884,6 +885,11 @@ process_incoming_migration_co(void *opaque)
>       ret = qemu_loadvm_state(mis->from_src_file);
>       mis->loadvm_co = NULL;
> 
> +    if (ret) {
> +        migrate_send_rp_error(mis);
> +        error_report("SENT RP ERROR");
> +    }
> +
>   trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
> 
>       ps = postcopy_state_get();
> @@ -1126,6 +1132,11 @@ bool migration_has_all_channels(void)
>       return true;
>   }
> +int migrate_send_rp_error(MigrationIncomingState *mis)
> +{
> +    return migrate_send_rp_message(mis, MIG_RP_MSG_ERROR, 0, NULL);
> +}
> +
>   int migrate_send_rp_switchover_ack(MigrationIncomingState *mis)
>   {
>       return migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, 
> NULL);
> @@ -2614,6 +2625,10 @@ static void *source_return_path_thread(void *opaque)
>               trace_source_return_path_thread_switchover_acked();
>               break;
> 
> +        case MIG_RP_MSG_ERROR:
> +            error_setg(&err, "DST indicated error");
> +            goto out;
> +
>           default:
>               break;
>           }
> diff --git a/migration/migration.h b/migration/migration.h
> index 01329bf824..f11ff7a199 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -553,6 +553,7 @@ void 
> migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>                                    char *block_name);
>   void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t 
> value);
>   int migrate_send_rp_switchover_ack(MigrationIncomingState *mis);
> +int migrate_send_rp_error(MigrationIncomingState *mis);
> 
>   void dirty_bitmap_mig_before_vm_start(void);
>   void dirty_bitmap_mig_cancel_outgoing(void);
> 
>> I suggest you open an issue on :
>>
>>   https://gitlab.com/qemu-project/qemu/-/issues/
>>
>> with a detailed description of your environment :
>>
>>   Host HW, Host OS, QEMU version, QEMU command line, Guest OS, etc.
>>
>> A template is provided when a new issue is created.
>>
>>
>> Thanks,
>>
>> C.
>>
>>
>>
> .


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Question] VFIO migration will not be aborted in a corner scenario
  2025-08-12 14:56     ` Cédric Le Goater
@ 2025-08-13 12:18       ` Kunkun Jiang via
  2025-08-18  6:13       ` Avihai Horon
  1 sibling, 0 replies; 11+ messages in thread
From: Kunkun Jiang via @ 2025-08-13 12:18 UTC (permalink / raw)
  To: Cédric Le Goater, Avihai Horon, Alex Williamson,
	Yishai Hadas
  Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu

Hi Cédric,

On 2025/8/12 22:56, Cédric Le Goater wrote:
> On 8/12/25 16:08, Avihai Horon wrote:
>>
>> On 11/08/2025 19:34, Cédric Le Goater wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Hello,
>>>
>>> + Avihai
>>>
>>> On 8/11/25 18:02, Kunkun Jiang wrote:
>>>> Hi all,
>>>>
>>>> While testing VFIO migration, I encountered an corner scenario case:
>>>> VFIO migration will not be aborted when the vfio device of dst-vm 
>>>> fails to transition from RESUMING to RUNNING state in 
>>>> vfio_vmstate_change.
>>>>
>>>> I saw the comments in the vfio_vmstate_change but I don't understand 
>>>> why no action is taken for this situation.
>>>
>>> There is error handling in vfio_vmstate_change() :
>>>
>>>         /*
>>>          * Migration should be aborted in this case, but 
>>> vm_state_notify()
>>>          * currently does not support reporting failures.
>>>          */
>>>         migration_file_set_error(ret, local_err);
>>
>> Hmm, I think this only sets the error on src. On dst we don't have 
>> MigrationState->to_dst_file, so we end up just reporting the error.
>> But even if we did set it, no one is checking if there is a migration 
>> error after vm_start() is called in process_incoming_migration_bh().
>>
>>>
>>>> Allowing the live migration process to continue could cause 
>>>> unrecoverable damage to the VM.
>>
>> What do you mean by unrecoverable damage to the VM?
>> If RESUMING->RUNNING transition fails, would a VFIO reset recover the 
>> device and allow the VM to continue operation with damage limited only 
>> to the VFIO device?
>>
>>>> In this case, can we directly exit the dst-vm? Through the 
>>>> return-path mechanism, the src-vm can continue to run.
>>>>
>>>> Looking forward to your reply.
>>>
>> The straightforward solution, as you suggested, is to exit dst upon 
>> error in RESUMING->RUNNING transition and notify about it to src 
>> through the return-path.
>> However, I am not sure if failing the migration after vm_start() on 
>> dst is a bit late (as we start vCPUs and do migration_block_activate, 
>> etc.).
>>
>> But I can think of another way to solve this, hopefully simpler.
>> According to VFIO migration uAPI [1]:
>>   * RESUMING -> STOP
>>   *   Leaving RESUMING terminates a data transfer session and 
>> indicates the
>>   *   device should complete processing of the data delivered by 
>> write(). The
>>   *   kernel migration driver should complete the incorporation of 
>> data written
>>   *   to the data transfer FD into the device internal state and perform
>>   *   final validity and consistency checking of the new device state. 
>> If the
>>   *   user provided data is found to be incomplete, inconsistent, or 
>> otherwise
>>   *   invalid, the migration driver must fail the SET_STATE ioctl and
>>   *   optionally go to the ERROR state as described below.
>>
>> So, IIUC, we can add an explicit RESUMING->STOP transition [2] after 
>> the device config is loaded (which is the last data the device is 
>> expected to receive).
>> If this transition fails, it means something was wrong with migration, 
>> and we can send src an error msg via return-path (and not continue to 
>> vm_start()).
>>
>> Maybe this approach is less complicated than the first one, and it 
>> will also work if src VM was paused prior migration.
>> I already tested some POC and it seems to be working (at least with an 
>> artificial error i injected in RESUMING->STOP transition).
>> Kunkun, can you apply the following diff [3] and check if this solves 
>> the issue?
>>
>> And in general, what do you think? Should we go with this approach or 
>> do you have other ideas?
>>
>> Thanks.
>>
>> [1] 
>> https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099 
>>
>> [2] Today RESUMING->STOP is done implicitly by the VFIO driver as part 
>> of RESUMING->RUNNING transition.
>> [3]
> 
> Avihai,
> 
> Could you please send an RFC patch with Peter and Fabiano in cc: ?
> This will help to discuss the proposal and keep track of the issue.
> 
> 
> Kunkun Jiang,
> 
> Could you please share details on your environment ?
Sorry for the late reply.My test scenario is this:
Server:		Kunpeng-920
VFUI device:	SEC, a Kunpeng hardware accelerator
Host OS:	openEuler 24.03, kernel 6.6, qemu 8.2
Guest OS:	openEuler 24.03
Test steps:
1. Start a VM and run a task using SEC inside the VM
2. Start migration and inject an error into the SEC used by dst-VM
3. The SEC used by dst-VM will be reset. So RESUMING->RUNNING transition 
will fail with probability.(The timing has to be just right)

> 
> Thanks,
> 
> C.
> 
> 
> .


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Question] VFIO migration will not be aborted in a corner scenario
  2025-08-12 14:58       ` Peter Xu
@ 2025-08-18  6:10         ` Avihai Horon
  0 siblings, 0 replies; 11+ messages in thread
From: Avihai Horon @ 2025-08-18  6:10 UTC (permalink / raw)
  To: Peter Xu, Cédric Le Goater
  Cc: Kunkun Jiang, Alex Williamson, Yishai Hadas,
	open list:All patches CC here, wanghaibin.wang, Zenghui Yu,
	Fabiano Rosas


On 12/08/2025 17:58, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, Aug 12, 2025 at 04:34:34PM +0200, Cédric Le Goater wrote:
>> +peter
>> +fabiano
>>
>> On 8/12/25 16:08, Avihai Horon wrote:
>>> On 11/08/2025 19:34, Cédric Le Goater wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> Hello,
>>>>
>>>> + Avihai
>>>>
>>>> On 8/11/25 18:02, Kunkun Jiang wrote:
>>>>> Hi all,
>>>>>
>>>>> While testing VFIO migration, I encountered an corner scenario case:
>>>>> VFIO migration will not be aborted when the vfio device of dst-vm fails to transition from RESUMING to RUNNING state in vfio_vmstate_change.
>>>>>
>>>>> I saw the comments in the vfio_vmstate_change but I don't understand why no action is taken for this situation.
>>>> There is error handling in vfio_vmstate_change() :
>>>>
>>>>          /*
>>>>           * Migration should be aborted in this case, but vm_state_notify()
>>>>           * currently does not support reporting failures.
>>>>           */
>>>>          migration_file_set_error(ret, local_err);
>>> Hmm, I think this only sets the error on src. On dst we don't have MigrationState->to_dst_file, so we end up just reporting the error.
>>> But even if we did set it, no one is checking if there is a migration error after vm_start() is called in process_incoming_migration_bh().
>>>
>>>>> Allowing the live migration process to continue could cause unrecoverable damage to the VM.
>>> What do you mean by unrecoverable damage to the VM?
>>> If RESUMING->RUNNING transition fails, would a VFIO reset recover the device and allow the VM to continue operation with damage limited only to the VFIO device?
>>>
>>>>> In this case, can we directly exit the dst-vm? Through the return-path mechanism, the src-vm can continue to run.
>>>>>
>>>>> Looking forward to your reply.
>>> The straightforward solution, as you suggested, is to exit dst upon error in RESUMING->RUNNING transition and notify about it to src through the return-path.
>>> However, I am not sure if failing the migration after vm_start() on dst is a bit late (as we start vCPUs and do migration_block_activate, etc.).
>>>
>>> But I can think of another way to solve this, hopefully simpler.
>>> According to VFIO migration uAPI [1]:
>>>    * RESUMING -> STOP
>>>    *   Leaving RESUMING terminates a data transfer session and indicates the
>>>    *   device should complete processing of the data delivered by write(). The
>>>    *   kernel migration driver should complete the incorporation of data written
>>>    *   to the data transfer FD into the device internal state and perform
>>>    *   final validity and consistency checking of the new device state. If the
>>>    *   user provided data is found to be incomplete, inconsistent, or otherwise
>>>    *   invalid, the migration driver must fail the SET_STATE ioctl and
>>>    *   optionally go to the ERROR state as described below.
>>>
>>> So, IIUC, we can add an explicit RESUMING->STOP transition [2] after the device config is loaded (which is the last data the device is expected to receive).
>>> If this transition fails, it means something was wrong with migration, and we can send src an error msg via return-path (and not continue to vm_start()).
>>>
>>> Maybe this approach is less complicated than the first one, and it will also work if src VM was paused prior migration.
>>> I already tested some POC and it seems to be working (at least with an artificial error i injected in RESUMING->STOP transition).
>>> Kunkun, can you apply the following diff [3] and check if this solves the issue?
>>>
>>> And in general, what do you think? Should we go with this approach or do you have other ideas?
>>>
>>> Thanks.
>>>
>>> [1] https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099
>>> [2] Today RESUMING->STOP is done implicitly by the VFIO driver as part of RESUMING->RUNNING transition.
>>> [3]
>>>
>>> diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
>>> index e4785031a7..66f8461f02 100644
>>> --- a/hw/vfio/migration-multifd.c
>>> +++ b/hw/vfio/migration-multifd.c
>>> @@ -267,6 +267,12 @@ static bool vfio_load_bufs_thread_load_config(VFIODevice *vbasedev,
>>>        ret = vfio_load_device_config_state(f_in, vbasedev);
>>>        bql_unlock();
>>>
>>> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
>>> +                                   VFIO_DEVICE_STATE_ERROR, errp);
>>> +    if (ret) {
>>> +        return false;
>>> +    }
>>> +
>>>        if (ret < 0) {
>>>            error_setg(errp, "%s: vfio_load_device_config_state() failed: %d",
>>>                       vbasedev->name, ret);
>>>
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index 4c06e3db93..a707d17a5b 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -737,6 +737,8 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>>>            switch (data) {
>>>            case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
>>>            {
>>> +            Error *local_err = NULL;
>>> +
>>>                if (vfio_multifd_transfer_enabled(vbasedev)) {
>>>                    error_report("%s: got DEV_CONFIG_STATE in main migration "
>>>                                 "channel but doing multifd transfer",
>>> @@ -744,7 +746,19 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>>>                    return -EINVAL;
>>>                }
>>>
>>> -            return vfio_load_device_config_state(f, opaque);
>>> +            ret = vfio_load_device_config_state(f, opaque);
>>> +            if (ret) {
>>> +                return ret;
>>> +            }
>>> +
>>> +            ret = vfio_migration_set_state_or_reset(
>>> +                vbasedev, VFIO_DEVICE_STATE_STOP, &local_err);
>>> +            if (ret) {
>>> +                error_report_err(local_err);
>>> +                return ret;
>>> +            }
>>> +
>>> +            return 0;
>>>            }
>>>            case VFIO_MIG_FLAG_DEV_SETUP_STATE:
>>>            {
>>>
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 10c216d25d..fd498c864d 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -91,6 +91,7 @@ enum mig_rp_message_type {
>>>        MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
>>>        MIG_RP_MSG_RESUME_ACK,   /* tell source that we are ready to resume */
>>>        MIG_RP_MSG_SWITCHOVER_ACK, /* Tell source it's OK to do switchover */
>>> +    MIG_RP_MSG_ERROR, /* Tell source that destination encountered an error */
>>>
>>>        MIG_RP_MSG_MAX
>>>    };
>>> @@ -884,6 +885,11 @@ process_incoming_migration_co(void *opaque)
>>>        ret = qemu_loadvm_state(mis->from_src_file);
>>>        mis->loadvm_co = NULL;
>>>
>>> +    if (ret) {
>>> +        migrate_send_rp_error(mis);
>>> +        error_report("SENT RP ERROR");
>>> +    }
>>> +
>>>    trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
>>>
>>>        ps = postcopy_state_get();
>>> @@ -1126,6 +1132,11 @@ bool migration_has_all_channels(void)
>>>        return true;
>>>    }
>>> +int migrate_send_rp_error(MigrationIncomingState *mis)
>>> +{
>>> +    return migrate_send_rp_message(mis, MIG_RP_MSG_ERROR, 0, NULL);
>>> +}
>>> +
>>>    int migrate_send_rp_switchover_ack(MigrationIncomingState *mis)
>>>    {
>>>        return migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, NULL);
>>> @@ -2614,6 +2625,10 @@ static void *source_return_path_thread(void *opaque)
>>>                trace_source_return_path_thread_switchover_acked();
>>>                break;
>>>
>>> +        case MIG_RP_MSG_ERROR:
>>> +            error_setg(&err, "DST indicated error");
>>> +            goto out;
> If this is only a boolean, we can reuse RP_SHUT.  Likely we could pass in
> an error to migration_incoming_state_destroy():
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 42a2a6e8f2..2ebba7838a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -441,7 +441,7 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
>       }
>   }
>
> -void migration_incoming_state_destroy(void)
> +void migration_incoming_state_destroy(bool has_error)
>   {
>       struct MigrationIncomingState *mis = migration_incoming_get_current();
>
> @@ -466,8 +466,11 @@ void migration_incoming_state_destroy(void)
>       qemu_loadvm_state_cleanup(mis);
>
>       if (mis->to_src_file) {
> -        /* Tell source that we are done */
> -        migrate_send_rp_shut(mis, qemu_file_get_error(mis->from_src_file) != 0);
> +        /* Tell source whether load succeeded */
> +        if (!has_error) {
> +            has_error = qemu_file_get_error(mis->from_src_file) != 0;
> +        }
> +        migrate_send_rp_shut(mis, has_error);
>           qemu_fclose(mis->to_src_file);
>           mis->to_src_file = NULL;
>       }
>
> Maybe it'll even work as late as process_incoming_migration_bh(), where
> vm_start() could fail - right now it couldn't, but if there'll be an error
> message reported upward then logically it can also set has_error=1 for the
> RP_SHUT message.  Src QEMU relies on RP_SHUT message and retval=0 to quit
> src QEMU, otherwise QEMU should fail the migration and restart VM on src.

Ah, nice! I didn't notice RP_SHUT can take an error value.

Will try that, thanks!

>
>>> +
>>>            default:
>>>                break;
>>>            }
>>> diff --git a/migration/migration.h b/migration/migration.h
>>> index 01329bf824..f11ff7a199 100644
>>> --- a/migration/migration.h
>>> +++ b/migration/migration.h
>>> @@ -553,6 +553,7 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>>>                                     char *block_name);
>>>    void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
>>>    int migrate_send_rp_switchover_ack(MigrationIncomingState *mis);
>>> +int migrate_send_rp_error(MigrationIncomingState *mis);
>>>
>>>    void dirty_bitmap_mig_before_vm_start(void);
>>>    void dirty_bitmap_mig_cancel_outgoing(void);
>>>
>>>> I suggest you open an issue on :
>>>>
>>>>    https://gitlab.com/qemu-project/qemu/-/issues/
>>>>
>>>> with a detailed description of your environment :
>>>>
>>>>    Host HW, Host OS, QEMU version, QEMU command line, Guest OS, etc.
>>>>
>>>> A template is provided when a new issue is created.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> C.
>>>>
>>>>
>>>>
> --
> Peter Xu
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Question] VFIO migration will not be aborted in a corner scenario
  2025-08-12 14:56     ` Cédric Le Goater
  2025-08-13 12:18       ` Kunkun Jiang via
@ 2025-08-18  6:13       ` Avihai Horon
  1 sibling, 0 replies; 11+ messages in thread
From: Avihai Horon @ 2025-08-18  6:13 UTC (permalink / raw)
  To: Cédric Le Goater, Kunkun Jiang, Alex Williamson,
	Yishai Hadas
  Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu


On 12/08/2025 17:56, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> On 8/12/25 16:08, Avihai Horon wrote:
>>
>> On 11/08/2025 19:34, Cédric Le Goater wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Hello,
>>>
>>> + Avihai
>>>
>>> On 8/11/25 18:02, Kunkun Jiang wrote:
>>>> Hi all,
>>>>
>>>> While testing VFIO migration, I encountered an corner scenario case:
>>>> VFIO migration will not be aborted when the vfio device of dst-vm 
>>>> fails to transition from RESUMING to RUNNING state in 
>>>> vfio_vmstate_change.
>>>>
>>>> I saw the comments in the vfio_vmstate_change but I don't 
>>>> understand why no action is taken for this situation.
>>>
>>> There is error handling in vfio_vmstate_change() :
>>>
>>>         /*
>>>          * Migration should be aborted in this case, but 
>>> vm_state_notify()
>>>          * currently does not support reporting failures.
>>>          */
>>>         migration_file_set_error(ret, local_err);
>>
>> Hmm, I think this only sets the error on src. On dst we don't have 
>> MigrationState->to_dst_file, so we end up just reporting the error.
>> But even if we did set it, no one is checking if there is a migration 
>> error after vm_start() is called in process_incoming_migration_bh().
>>
>>>
>>>> Allowing the live migration process to continue could cause 
>>>> unrecoverable damage to the VM.
>>
>> What do you mean by unrecoverable damage to the VM?
>> If RESUMING->RUNNING transition fails, would a VFIO reset recover the 
>> device and allow the VM to continue operation with damage limited 
>> only to the VFIO device?
>>
>>>> In this case, can we directly exit the dst-vm? Through the 
>>>> return-path mechanism, the src-vm can continue to run.
>>>>
>>>> Looking forward to your reply.
>>>
>> The straightforward solution, as you suggested, is to exit dst upon 
>> error in RESUMING->RUNNING transition and notify about it to src 
>> through the return-path.
>> However, I am not sure if failing the migration after vm_start() on 
>> dst is a bit late (as we start vCPUs and do migration_block_activate, 
>> etc.).
>>
>> But I can think of another way to solve this, hopefully simpler.
>> According to VFIO migration uAPI [1]:
>>   * RESUMING -> STOP
>>   *   Leaving RESUMING terminates a data transfer session and 
>> indicates the
>>   *   device should complete processing of the data delivered by 
>> write(). The
>>   *   kernel migration driver should complete the incorporation of 
>> data written
>>   *   to the data transfer FD into the device internal state and perform
>>   *   final validity and consistency checking of the new device 
>> state. If the
>>   *   user provided data is found to be incomplete, inconsistent, or 
>> otherwise
>>   *   invalid, the migration driver must fail the SET_STATE ioctl and
>>   *   optionally go to the ERROR state as described below.
>>
>> So, IIUC, we can add an explicit RESUMING->STOP transition [2] after 
>> the device config is loaded (which is the last data the device is 
>> expected to receive).
>> If this transition fails, it means something was wrong with 
>> migration, and we can send src an error msg via return-path (and not 
>> continue to vm_start()).
>>
>> Maybe this approach is less complicated than the first one, and it 
>> will also work if src VM was paused prior migration.
>> I already tested some POC and it seems to be working (at least with 
>> an artificial error i injected in RESUMING->STOP transition).
>> Kunkun, can you apply the following diff [3] and check if this solves 
>> the issue?
>>
>> And in general, what do you think? Should we go with this approach or 
>> do you have other ideas?
>>
>> Thanks.
>>
>> [1] 
>> https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099
>> [2] Today RESUMING->STOP is done implicitly by the VFIO driver as 
>> part of RESUMING->RUNNING transition.
>> [3]
>
> Avihai,
>
> Could you please send an RFC patch with Peter and Fabiano in cc: ?
> This will help to discuss the proposal and keep track of the issue.
>
Yes, of course. But I am a bit busy with other stuff, so it may take me 
a few days to do that.

Of course, If anyone would like, feel free to send your own proposal.

Thanks.

>
> Kunkun Jiang,
>
> Could you please share details on your environment ?
>
> Thanks,
>
> C.
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Question] VFIO migration will not be aborted in a corner scenario
  2025-08-13 12:18     ` Kunkun Jiang via
@ 2025-08-18  6:44       ` Avihai Horon
  0 siblings, 0 replies; 11+ messages in thread
From: Avihai Horon @ 2025-08-18  6:44 UTC (permalink / raw)
  To: Kunkun Jiang, Cédric Le Goater, Alex Williamson,
	Yishai Hadas
  Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu


On 13/08/2025 15:18, Kunkun Jiang wrote:
> External email: Use caution opening links or attachments
>
>
> Hi Avihai,
>
> On 2025/8/12 22:08, Avihai Horon wrote:
>>
>> On 11/08/2025 19:34, Cédric Le Goater wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Hello,
>>>
>>> + Avihai
>>>
>>> On 8/11/25 18:02, Kunkun Jiang wrote:
>>>> Hi all,
>>>>
>>>> While testing VFIO migration, I encountered an corner scenario case:
>>>> VFIO migration will not be aborted when the vfio device of dst-vm
>>>> fails to transition from RESUMING to RUNNING state in
>>>> vfio_vmstate_change.
>>>>
>>>> I saw the comments in the vfio_vmstate_change but I don't understand
>>>> why no action is taken for this situation.
>>>
>>> There is error handling in vfio_vmstate_change() :
>>>
>>>         /*
>>>          * Migration should be aborted in this case, but
>>> vm_state_notify()
>>>          * currently does not support reporting failures.
>>>          */
>>>         migration_file_set_error(ret, local_err);
>>
>> Hmm, I think this only sets the error on src. On dst we don't have
>> MigrationState->to_dst_file, so we end up just reporting the erro > 
>> But even if we did set it, no one is checking if there is a migration
>> error after vm_start() is called in process_incoming_migration_bh().
> Yes, that's what I mean.
>>
>>>
>>>> Allowing the live migration process to continue could cause
>>>> unrecoverable damage to the VM.
>>
>> What do you mean by unrecoverable damage to the VM?
>> If RESUMING->RUNNING transition fails, would a VFIO reset recover the
>> device and allow the VM to continue operation with damage limited only
>> to the VFIO device?
> I didn't express myself clearly, let me explain again.
> In my test, I ran a task inside the VM that was using this vfio device.
> In this corner scenario, the task is aborted immediately and the VM is
> still running normally. I mean it is unacceptable that the task is
> aborted directly.

Ah I understand.

>>
>>>> In this case, can we directly exit the dst-vm? Through the
>>>> return-path mechanism, the src-vm can continue to run.
>>>>
>>>> Looking forward to your reply.
>>>
>> The straightforward solution, as you suggested, is to exit dst upon
>> error in RESUMING->RUNNING transition and notify about it to src through
>> the return-path.
>> However, I am not sure if failing the migration after vm_start() on dst
>> is a bit late (as we start vCPUs and do migration_block_activate, etc.).
> I think vCPUs have not started yet. vCPUs will be started only after the
> all callbacks of vm_state_notify are executed.

Yes, in that case need to refactor vm_prepare_start/vm_state_notify to 
handle errors. May be a bit involved, need to further look into that.

Thanks.

> If there are any problems with my analysis, please point them out.
>>
>> But I can think of another way to solve this, hopefully simpler.
>> According to VFIO migration uAPI [1]:
>>   * RESUMING -> STOP
>>   *   Leaving RESUMING terminates a data transfer session and 
>> indicates the
>>   *   device should complete processing of the data delivered by
>> write(). The
>>   *   kernel migration driver should complete the incorporation of data
>> written
>>   *   to the data transfer FD into the device internal state and perform
>>   *   final validity and consistency checking of the new device state.
>> If the
>>   *   user provided data is found to be incomplete, inconsistent, or
>> otherwise
>>   *   invalid, the migration driver must fail the SET_STATE ioctl and
>>   *   optionally go to the ERROR state as described below.
>>
>> So, IIUC, we can add an explicit RESUMING->STOP transition [2] after the
>> device config is loaded (which is the last data the device is expected
>> to receive).
>> If this transition fails, it means something was wrong with migration,
>> and we can send src an error msg via return-path (and not continue to
>> vm_start()).
>>
>> Maybe this approach is less complicated than the first one, and it will
>> also work if src VM was paused prior migration.
>> I already tested some POC and it seems to be working (at least with an
>> artificial error i injected in RESUMING->STOP transition).
>> Kunkun, can you apply the following diff [3] and check if this solves
>> the issue?
> Ok, I'll try it.
>>
>> And in general, what do you think? Should we go with this approach or do
>> you have other ideas?
> My current approach is rather crude. As I said above, I think vcpu has
> not started yet, so I changed it like this.
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 4c06e3db93..70ccb706c6 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -878,7 +878,7 @@ static void vfio_vmstate_change_prepare(void
> *opaque, bool running,
>  static void vfio_vmstate_change(void *opaque, bool running, RunState
> state)
>  {
>      VFIODevice *vbasedev = opaque;
> -    enum vfio_device_mig_state new_state;
> +    enum vfio_device_mig_state new_state, pre_state;
>      Error *local_err = NULL;
>      int ret;
>
> @@ -899,6 +899,10 @@ static void vfio_vmstate_change(void *opaque, bool
> running, RunState state)
>           * currently does not support reporting failures.
>           */
>          migration_file_set_error(ret, local_err);
> +
> +        if (pre_state == VFIO_DEVICE_STATE_RESUMING) {
> +            exit(EXIT_FAILURE);
> +        }
>      }
>>
>> Thanks.
>>
>> [1]
>> https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099 
>>
>>
>> [2] Today RESUMING->STOP is done implicitly by the VFIO driver as part
>> of RESUMING->RUNNING transition.
>> [3]
>>
>> diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
>> index e4785031a7..66f8461f02 100644
>> --- a/hw/vfio/migration-multifd.c
>> +++ b/hw/vfio/migration-multifd.c
>> @@ -267,6 +267,12 @@ static bool
>> vfio_load_bufs_thread_load_config(VFIODevice *vbasedev,
>>       ret = vfio_load_device_config_state(f_in, vbasedev);
>>       bql_unlock();
>>
>> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
>> +                                   VFIO_DEVICE_STATE_ERROR, errp);
>> +    if (ret) {
>> +        return false;
>> +    }
>> +
>>       if (ret < 0) {
>>           error_setg(errp, "%s: vfio_load_device_config_state() failed:
>> %d",
>>                      vbasedev->name, ret);
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 4c06e3db93..a707d17a5b 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -737,6 +737,8 @@ static int vfio_load_state(QEMUFile *f, void
>> *opaque, int version_id)
>>           switch (data) {
>>           case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
>>           {
>> +            Error *local_err = NULL;
>> +
>>               if (vfio_multifd_transfer_enabled(vbasedev)) {
>>                   error_report("%s: got DEV_CONFIG_STATE in main
>> migration "
>>                                "channel but doing multifd transfer",
>> @@ -744,7 +746,19 @@ static int vfio_load_state(QEMUFile *f, void
>> *opaque, int version_id)
>>                   return -EINVAL;
>>               }
>>
>> -            return vfio_load_device_config_state(f, opaque);
>> +            ret = vfio_load_device_config_state(f, opaque);
>> +            if (ret) {
>> +                return ret;
>> +            }
>> +
>> +            ret = vfio_migration_set_state_or_reset(
>> +                vbasedev, VFIO_DEVICE_STATE_STOP, &local_err);
>> +            if (ret) {
>> +                error_report_err(local_err);
>> +                return ret;
>> +            }
>> +
>> +            return 0;
>>           }
>>           case VFIO_MIG_FLAG_DEV_SETUP_STATE:
>>           {
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 10c216d25d..fd498c864d 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -91,6 +91,7 @@ enum mig_rp_message_type {
>>       MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
>>       MIG_RP_MSG_RESUME_ACK,   /* tell source that we are ready to
>> resume */
>>       MIG_RP_MSG_SWITCHOVER_ACK, /* Tell source it's OK to do 
>> switchover */
>> +    MIG_RP_MSG_ERROR, /* Tell source that destination encountered an
>> error */
>>
>>       MIG_RP_MSG_MAX
>>   };
>> @@ -884,6 +885,11 @@ process_incoming_migration_co(void *opaque)
>>       ret = qemu_loadvm_state(mis->from_src_file);
>>       mis->loadvm_co = NULL;
>>
>> +    if (ret) {
>> +        migrate_send_rp_error(mis);
>> +        error_report("SENT RP ERROR");
>> +    }
>> +
>> trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
>>
>>       ps = postcopy_state_get();
>> @@ -1126,6 +1132,11 @@ bool migration_has_all_channels(void)
>>       return true;
>>   }
>> +int migrate_send_rp_error(MigrationIncomingState *mis)
>> +{
>> +    return migrate_send_rp_message(mis, MIG_RP_MSG_ERROR, 0, NULL);
>> +}
>> +
>>   int migrate_send_rp_switchover_ack(MigrationIncomingState *mis)
>>   {
>>       return migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0,
>> NULL);
>> @@ -2614,6 +2625,10 @@ static void *source_return_path_thread(void 
>> *opaque)
>> trace_source_return_path_thread_switchover_acked();
>>               break;
>>
>> +        case MIG_RP_MSG_ERROR:
>> +            error_setg(&err, "DST indicated error");
>> +            goto out;
>> +
>>           default:
>>               break;
>>           }
>> diff --git a/migration/migration.h b/migration/migration.h
>> index 01329bf824..f11ff7a199 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -553,6 +553,7 @@ void
>> migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>>                                    char *block_name);
>>   void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t
>> value);
>>   int migrate_send_rp_switchover_ack(MigrationIncomingState *mis);
>> +int migrate_send_rp_error(MigrationIncomingState *mis);
>>
>>   void dirty_bitmap_mig_before_vm_start(void);
>>   void dirty_bitmap_mig_cancel_outgoing(void);
>>
>>> I suggest you open an issue on :
>>>
>>> https://gitlab.com/qemu-project/qemu/-/issues/
>>>
>>> with a detailed description of your environment :
>>>
>>>   Host HW, Host OS, QEMU version, QEMU command line, Guest OS, etc.
>>>
>>> A template is provided when a new issue is created.
>>>
>>>
>>> Thanks,
>>>
>>> C.
>>>
>>>
>>>
>> .


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-08-18  6:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-11 16:02 [Question] VFIO migration will not be aborted in a corner scenario Kunkun Jiang via
2025-08-11 16:34 ` Cédric Le Goater
2025-08-12 14:08   ` Avihai Horon
2025-08-12 14:34     ` Cédric Le Goater
2025-08-12 14:58       ` Peter Xu
2025-08-18  6:10         ` Avihai Horon
2025-08-12 14:56     ` Cédric Le Goater
2025-08-13 12:18       ` Kunkun Jiang via
2025-08-18  6:13       ` Avihai Horon
2025-08-13 12:18     ` Kunkun Jiang via
2025-08-18  6:44       ` Avihai Horon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).