* Re: [Question] VFIO migration will not be aborted in a corner scenario
2025-08-12 14:08 ` Avihai Horon
@ 2025-08-12 14:34 ` Cédric Le Goater
2025-08-12 14:58 ` Peter Xu
2025-08-12 14:56 ` Cédric Le Goater
2025-08-13 12:18 ` Kunkun Jiang via
2 siblings, 1 reply; 11+ messages in thread
From: Cédric Le Goater @ 2025-08-12 14:34 UTC (permalink / raw)
To: Avihai Horon, Kunkun Jiang, Alex Williamson, Yishai Hadas
Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu,
'Peter Xu', Fabiano Rosas
+peter
+fabiano
On 8/12/25 16:08, Avihai Horon wrote:
>
> On 11/08/2025 19:34, Cédric Le Goater wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello,
>>
>> + Avihai
>>
>> On 8/11/25 18:02, Kunkun Jiang wrote:
>>> Hi all,
>>>
>>> While testing VFIO migration, I encountered an corner scenario case:
>>> VFIO migration will not be aborted when the vfio device of dst-vm fails to transition from RESUMING to RUNNING state in vfio_vmstate_change.
>>>
>>> I saw the comments in the vfio_vmstate_change but I don't understand why no action is taken for this situation.
>>
>> There is error handling in vfio_vmstate_change() :
>>
>> /*
>> * Migration should be aborted in this case, but vm_state_notify()
>> * currently does not support reporting failures.
>> */
>> migration_file_set_error(ret, local_err);
>
> Hmm, I think this only sets the error on src. On dst we don't have MigrationState->to_dst_file, so we end up just reporting the error.
> But even if we did set it, no one is checking if there is a migration error after vm_start() is called in process_incoming_migration_bh().
>
>>
>>> Allowing the live migration process to continue could cause unrecoverable damage to the VM.
>
> What do you mean by unrecoverable damage to the VM?
> If RESUMING->RUNNING transition fails, would a VFIO reset recover the device and allow the VM to continue operation with damage limited only to the VFIO device?
>
>>> In this case, can we directly exit the dst-vm? Through the return-path mechanism, the src-vm can continue to run.
>>>
>>> Looking forward to your reply.
>>
> The straightforward solution, as you suggested, is to exit dst upon error in RESUMING->RUNNING transition and notify about it to src through the return-path.
> However, I am not sure if failing the migration after vm_start() on dst is a bit late (as we start vCPUs and do migration_block_activate, etc.).
>
> But I can think of another way to solve this, hopefully simpler.
> According to VFIO migration uAPI [1]:
> * RESUMING -> STOP
> * Leaving RESUMING terminates a data transfer session and indicates the
> * device should complete processing of the data delivered by write(). The
> * kernel migration driver should complete the incorporation of data written
> * to the data transfer FD into the device internal state and perform
> * final validity and consistency checking of the new device state. If the
> * user provided data is found to be incomplete, inconsistent, or otherwise
> * invalid, the migration driver must fail the SET_STATE ioctl and
> * optionally go to the ERROR state as described below.
>
> So, IIUC, we can add an explicit RESUMING->STOP transition [2] after the device config is loaded (which is the last data the device is expected to receive).
> If this transition fails, it means something was wrong with migration, and we can send src an error msg via return-path (and not continue to vm_start()).
>
> Maybe this approach is less complicated than the first one, and it will also work if src VM was paused prior migration.
> I already tested some POC and it seems to be working (at least with an artificial error i injected in RESUMING->STOP transition).
> Kunkun, can you apply the following diff [3] and check if this solves the issue?
>
> And in general, what do you think? Should we go with this approach or do you have other ideas?
>
> Thanks.
>
> [1] https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099
> [2] Today RESUMING->STOP is done implicitly by the VFIO driver as part of RESUMING->RUNNING transition.
> [3]
>
> diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
> index e4785031a7..66f8461f02 100644
> --- a/hw/vfio/migration-multifd.c
> +++ b/hw/vfio/migration-multifd.c
> @@ -267,6 +267,12 @@ static bool vfio_load_bufs_thread_load_config(VFIODevice *vbasedev,
> ret = vfio_load_device_config_state(f_in, vbasedev);
> bql_unlock();
>
> + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
> + VFIO_DEVICE_STATE_ERROR, errp);
> + if (ret) {
> + return false;
> + }
> +
> if (ret < 0) {
> error_setg(errp, "%s: vfio_load_device_config_state() failed: %d",
> vbasedev->name, ret);
>
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 4c06e3db93..a707d17a5b 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -737,6 +737,8 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
> switch (data) {
> case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
> {
> + Error *local_err = NULL;
> +
> if (vfio_multifd_transfer_enabled(vbasedev)) {
> error_report("%s: got DEV_CONFIG_STATE in main migration "
> "channel but doing multifd transfer",
> @@ -744,7 +746,19 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
> return -EINVAL;
> }
>
> - return vfio_load_device_config_state(f, opaque);
> + ret = vfio_load_device_config_state(f, opaque);
> + if (ret) {
> + return ret;
> + }
> +
> + ret = vfio_migration_set_state_or_reset(
> + vbasedev, VFIO_DEVICE_STATE_STOP, &local_err);
> + if (ret) {
> + error_report_err(local_err);
> + return ret;
> + }
> +
> + return 0;
> }
> case VFIO_MIG_FLAG_DEV_SETUP_STATE:
> {
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 10c216d25d..fd498c864d 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -91,6 +91,7 @@ enum mig_rp_message_type {
> MIG_RP_MSG_RECV_BITMAP, /* send recved_bitmap back to source */
> MIG_RP_MSG_RESUME_ACK, /* tell source that we are ready to resume */
> MIG_RP_MSG_SWITCHOVER_ACK, /* Tell source it's OK to do switchover */
> + MIG_RP_MSG_ERROR, /* Tell source that destination encountered an error */
>
> MIG_RP_MSG_MAX
> };
> @@ -884,6 +885,11 @@ process_incoming_migration_co(void *opaque)
> ret = qemu_loadvm_state(mis->from_src_file);
> mis->loadvm_co = NULL;
>
> + if (ret) {
> + migrate_send_rp_error(mis);
> + error_report("SENT RP ERROR");
> + }
> +
> trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
>
> ps = postcopy_state_get();
> @@ -1126,6 +1132,11 @@ bool migration_has_all_channels(void)
> return true;
> }
> +int migrate_send_rp_error(MigrationIncomingState *mis)
> +{
> + return migrate_send_rp_message(mis, MIG_RP_MSG_ERROR, 0, NULL);
> +}
> +
> int migrate_send_rp_switchover_ack(MigrationIncomingState *mis)
> {
> return migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, NULL);
> @@ -2614,6 +2625,10 @@ static void *source_return_path_thread(void *opaque)
> trace_source_return_path_thread_switchover_acked();
> break;
>
> + case MIG_RP_MSG_ERROR:
> + error_setg(&err, "DST indicated error");
> + goto out;
> +
> default:
> break;
> }
> diff --git a/migration/migration.h b/migration/migration.h
> index 01329bf824..f11ff7a199 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -553,6 +553,7 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
> char *block_name);
> void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
> int migrate_send_rp_switchover_ack(MigrationIncomingState *mis);
> +int migrate_send_rp_error(MigrationIncomingState *mis);
>
> void dirty_bitmap_mig_before_vm_start(void);
> void dirty_bitmap_mig_cancel_outgoing(void);
>
>> I suggest you open an issue on :
>>
>> https://gitlab.com/qemu-project/qemu/-/issues/
>>
>> with a detailed description of your environment :
>>
>> Host HW, Host OS, QEMU version, QEMU command line, Guest OS, etc.
>>
>> A template is provided when a new issue is created.
>>
>>
>> Thanks,
>>
>> C.
>>
>>
>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [Question] VFIO migration will not be aborted in a corner scenario
2025-08-12 14:34 ` Cédric Le Goater
@ 2025-08-12 14:58 ` Peter Xu
2025-08-18 6:10 ` Avihai Horon
0 siblings, 1 reply; 11+ messages in thread
From: Peter Xu @ 2025-08-12 14:58 UTC (permalink / raw)
To: Cédric Le Goater
Cc: Avihai Horon, Kunkun Jiang, Alex Williamson, Yishai Hadas,
open list:All patches CC here, wanghaibin.wang, Zenghui Yu,
Fabiano Rosas
On Tue, Aug 12, 2025 at 04:34:34PM +0200, Cédric Le Goater wrote:
> +peter
> +fabiano
>
> On 8/12/25 16:08, Avihai Horon wrote:
> >
> > On 11/08/2025 19:34, Cédric Le Goater wrote:
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > Hello,
> > >
> > > + Avihai
> > >
> > > On 8/11/25 18:02, Kunkun Jiang wrote:
> > > > Hi all,
> > > >
> > > > While testing VFIO migration, I encountered an corner scenario case:
> > > > VFIO migration will not be aborted when the vfio device of dst-vm fails to transition from RESUMING to RUNNING state in vfio_vmstate_change.
> > > >
> > > > I saw the comments in the vfio_vmstate_change but I don't understand why no action is taken for this situation.
> > >
> > > There is error handling in vfio_vmstate_change() :
> > >
> > > /*
> > > * Migration should be aborted in this case, but vm_state_notify()
> > > * currently does not support reporting failures.
> > > */
> > > migration_file_set_error(ret, local_err);
> >
> > Hmm, I think this only sets the error on src. On dst we don't have MigrationState->to_dst_file, so we end up just reporting the error.
> > But even if we did set it, no one is checking if there is a migration error after vm_start() is called in process_incoming_migration_bh().
> >
> > >
> > > > Allowing the live migration process to continue could cause unrecoverable damage to the VM.
> >
> > What do you mean by unrecoverable damage to the VM?
> > If RESUMING->RUNNING transition fails, would a VFIO reset recover the device and allow the VM to continue operation with damage limited only to the VFIO device?
> >
> > > > In this case, can we directly exit the dst-vm? Through the return-path mechanism, the src-vm can continue to run.
> > > >
> > > > Looking forward to your reply.
> > >
> > The straightforward solution, as you suggested, is to exit dst upon error in RESUMING->RUNNING transition and notify about it to src through the return-path.
> > However, I am not sure if failing the migration after vm_start() on dst is a bit late (as we start vCPUs and do migration_block_activate, etc.).
> >
> > But I can think of another way to solve this, hopefully simpler.
> > According to VFIO migration uAPI [1]:
> > * RESUMING -> STOP
> > * Leaving RESUMING terminates a data transfer session and indicates the
> > * device should complete processing of the data delivered by write(). The
> > * kernel migration driver should complete the incorporation of data written
> > * to the data transfer FD into the device internal state and perform
> > * final validity and consistency checking of the new device state. If the
> > * user provided data is found to be incomplete, inconsistent, or otherwise
> > * invalid, the migration driver must fail the SET_STATE ioctl and
> > * optionally go to the ERROR state as described below.
> >
> > So, IIUC, we can add an explicit RESUMING->STOP transition [2] after the device config is loaded (which is the last data the device is expected to receive).
> > If this transition fails, it means something was wrong with migration, and we can send src an error msg via return-path (and not continue to vm_start()).
> >
> > Maybe this approach is less complicated than the first one, and it will also work if src VM was paused prior migration.
> > I already tested some POC and it seems to be working (at least with an artificial error i injected in RESUMING->STOP transition).
> > Kunkun, can you apply the following diff [3] and check if this solves the issue?
> >
> > And in general, what do you think? Should we go with this approach or do you have other ideas?
> >
> > Thanks.
> >
> > [1] https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099
> > [2] Today RESUMING->STOP is done implicitly by the VFIO driver as part of RESUMING->RUNNING transition.
> > [3]
> >
> > diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
> > index e4785031a7..66f8461f02 100644
> > --- a/hw/vfio/migration-multifd.c
> > +++ b/hw/vfio/migration-multifd.c
> > @@ -267,6 +267,12 @@ static bool vfio_load_bufs_thread_load_config(VFIODevice *vbasedev,
> > ret = vfio_load_device_config_state(f_in, vbasedev);
> > bql_unlock();
> >
> > + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
> > + VFIO_DEVICE_STATE_ERROR, errp);
> > + if (ret) {
> > + return false;
> > + }
> > +
> > if (ret < 0) {
> > error_setg(errp, "%s: vfio_load_device_config_state() failed: %d",
> > vbasedev->name, ret);
> >
> > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> > index 4c06e3db93..a707d17a5b 100644
> > --- a/hw/vfio/migration.c
> > +++ b/hw/vfio/migration.c
> > @@ -737,6 +737,8 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
> > switch (data) {
> > case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
> > {
> > + Error *local_err = NULL;
> > +
> > if (vfio_multifd_transfer_enabled(vbasedev)) {
> > error_report("%s: got DEV_CONFIG_STATE in main migration "
> > "channel but doing multifd transfer",
> > @@ -744,7 +746,19 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
> > return -EINVAL;
> > }
> >
> > - return vfio_load_device_config_state(f, opaque);
> > + ret = vfio_load_device_config_state(f, opaque);
> > + if (ret) {
> > + return ret;
> > + }
> > +
> > + ret = vfio_migration_set_state_or_reset(
> > + vbasedev, VFIO_DEVICE_STATE_STOP, &local_err);
> > + if (ret) {
> > + error_report_err(local_err);
> > + return ret;
> > + }
> > +
> > + return 0;
> > }
> > case VFIO_MIG_FLAG_DEV_SETUP_STATE:
> > {
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 10c216d25d..fd498c864d 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -91,6 +91,7 @@ enum mig_rp_message_type {
> > MIG_RP_MSG_RECV_BITMAP, /* send recved_bitmap back to source */
> > MIG_RP_MSG_RESUME_ACK, /* tell source that we are ready to resume */
> > MIG_RP_MSG_SWITCHOVER_ACK, /* Tell source it's OK to do switchover */
> > + MIG_RP_MSG_ERROR, /* Tell source that destination encountered an error */
> >
> > MIG_RP_MSG_MAX
> > };
> > @@ -884,6 +885,11 @@ process_incoming_migration_co(void *opaque)
> > ret = qemu_loadvm_state(mis->from_src_file);
> > mis->loadvm_co = NULL;
> >
> > + if (ret) {
> > + migrate_send_rp_error(mis);
> > + error_report("SENT RP ERROR");
> > + }
> > +
> > trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
> >
> > ps = postcopy_state_get();
> > @@ -1126,6 +1132,11 @@ bool migration_has_all_channels(void)
> > return true;
> > }
> > +int migrate_send_rp_error(MigrationIncomingState *mis)
> > +{
> > + return migrate_send_rp_message(mis, MIG_RP_MSG_ERROR, 0, NULL);
> > +}
> > +
> > int migrate_send_rp_switchover_ack(MigrationIncomingState *mis)
> > {
> > return migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, NULL);
> > @@ -2614,6 +2625,10 @@ static void *source_return_path_thread(void *opaque)
> > trace_source_return_path_thread_switchover_acked();
> > break;
> >
> > + case MIG_RP_MSG_ERROR:
> > + error_setg(&err, "DST indicated error");
> > + goto out;
If this is only a boolean, we can reuse RP_SHUT. Likely we could pass in
an error to migration_incoming_state_destroy():
diff --git a/migration/migration.c b/migration/migration.c
index 42a2a6e8f2..2ebba7838a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -441,7 +441,7 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
}
}
-void migration_incoming_state_destroy(void)
+void migration_incoming_state_destroy(bool has_error)
{
struct MigrationIncomingState *mis = migration_incoming_get_current();
@@ -466,8 +466,11 @@ void migration_incoming_state_destroy(void)
qemu_loadvm_state_cleanup(mis);
if (mis->to_src_file) {
- /* Tell source that we are done */
- migrate_send_rp_shut(mis, qemu_file_get_error(mis->from_src_file) != 0);
+ /* Tell source whether load succeeded */
+ if (!has_error) {
+ has_error = qemu_file_get_error(mis->from_src_file) != 0;
+ }
+ migrate_send_rp_shut(mis, has_error);
qemu_fclose(mis->to_src_file);
mis->to_src_file = NULL;
}
Maybe it'll even work as late as process_incoming_migration_bh(), where
vm_start() could fail - right now it couldn't, but if there'll be an error
message reported upward then logically it can also set has_error=1 for the
RP_SHUT message. Src QEMU relies on RP_SHUT message and retval=0 to quit
src QEMU, otherwise QEMU should fail the migration and restart VM on src.
> > +
> > default:
> > break;
> > }
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 01329bf824..f11ff7a199 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -553,6 +553,7 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
> > char *block_name);
> > void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
> > int migrate_send_rp_switchover_ack(MigrationIncomingState *mis);
> > +int migrate_send_rp_error(MigrationIncomingState *mis);
> >
> > void dirty_bitmap_mig_before_vm_start(void);
> > void dirty_bitmap_mig_cancel_outgoing(void);
> >
> > > I suggest you open an issue on :
> > >
> > > https://gitlab.com/qemu-project/qemu/-/issues/
> > >
> > > with a detailed description of your environment :
> > >
> > > Host HW, Host OS, QEMU version, QEMU command line, Guest OS, etc.
> > >
> > > A template is provided when a new issue is created.
> > >
> > >
> > > Thanks,
> > >
> > > C.
> > >
> > >
> > >
> >
>
--
Peter Xu
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [Question] VFIO migration will not be aborted in a corner scenario
2025-08-12 14:58 ` Peter Xu
@ 2025-08-18 6:10 ` Avihai Horon
0 siblings, 0 replies; 11+ messages in thread
From: Avihai Horon @ 2025-08-18 6:10 UTC (permalink / raw)
To: Peter Xu, Cédric Le Goater
Cc: Kunkun Jiang, Alex Williamson, Yishai Hadas,
open list:All patches CC here, wanghaibin.wang, Zenghui Yu,
Fabiano Rosas
On 12/08/2025 17:58, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, Aug 12, 2025 at 04:34:34PM +0200, Cédric Le Goater wrote:
>> +peter
>> +fabiano
>>
>> On 8/12/25 16:08, Avihai Horon wrote:
>>> On 11/08/2025 19:34, Cédric Le Goater wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> Hello,
>>>>
>>>> + Avihai
>>>>
>>>> On 8/11/25 18:02, Kunkun Jiang wrote:
>>>>> Hi all,
>>>>>
>>>>> While testing VFIO migration, I encountered an corner scenario case:
>>>>> VFIO migration will not be aborted when the vfio device of dst-vm fails to transition from RESUMING to RUNNING state in vfio_vmstate_change.
>>>>>
>>>>> I saw the comments in the vfio_vmstate_change but I don't understand why no action is taken for this situation.
>>>> There is error handling in vfio_vmstate_change() :
>>>>
>>>> /*
>>>> * Migration should be aborted in this case, but vm_state_notify()
>>>> * currently does not support reporting failures.
>>>> */
>>>> migration_file_set_error(ret, local_err);
>>> Hmm, I think this only sets the error on src. On dst we don't have MigrationState->to_dst_file, so we end up just reporting the error.
>>> But even if we did set it, no one is checking if there is a migration error after vm_start() is called in process_incoming_migration_bh().
>>>
>>>>> Allowing the live migration process to continue could cause unrecoverable damage to the VM.
>>> What do you mean by unrecoverable damage to the VM?
>>> If RESUMING->RUNNING transition fails, would a VFIO reset recover the device and allow the VM to continue operation with damage limited only to the VFIO device?
>>>
>>>>> In this case, can we directly exit the dst-vm? Through the return-path mechanism, the src-vm can continue to run.
>>>>>
>>>>> Looking forward to your reply.
>>> The straightforward solution, as you suggested, is to exit dst upon error in RESUMING->RUNNING transition and notify about it to src through the return-path.
>>> However, I am not sure if failing the migration after vm_start() on dst is a bit late (as we start vCPUs and do migration_block_activate, etc.).
>>>
>>> But I can think of another way to solve this, hopefully simpler.
>>> According to VFIO migration uAPI [1]:
>>> * RESUMING -> STOP
>>> * Leaving RESUMING terminates a data transfer session and indicates the
>>> * device should complete processing of the data delivered by write(). The
>>> * kernel migration driver should complete the incorporation of data written
>>> * to the data transfer FD into the device internal state and perform
>>> * final validity and consistency checking of the new device state. If the
>>> * user provided data is found to be incomplete, inconsistent, or otherwise
>>> * invalid, the migration driver must fail the SET_STATE ioctl and
>>> * optionally go to the ERROR state as described below.
>>>
>>> So, IIUC, we can add an explicit RESUMING->STOP transition [2] after the device config is loaded (which is the last data the device is expected to receive).
>>> If this transition fails, it means something was wrong with migration, and we can send src an error msg via return-path (and not continue to vm_start()).
>>>
>>> Maybe this approach is less complicated than the first one, and it will also work if src VM was paused prior migration.
>>> I already tested some POC and it seems to be working (at least with an artificial error i injected in RESUMING->STOP transition).
>>> Kunkun, can you apply the following diff [3] and check if this solves the issue?
>>>
>>> And in general, what do you think? Should we go with this approach or do you have other ideas?
>>>
>>> Thanks.
>>>
>>> [1] https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099
>>> [2] Today RESUMING->STOP is done implicitly by the VFIO driver as part of RESUMING->RUNNING transition.
>>> [3]
>>>
>>> diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
>>> index e4785031a7..66f8461f02 100644
>>> --- a/hw/vfio/migration-multifd.c
>>> +++ b/hw/vfio/migration-multifd.c
>>> @@ -267,6 +267,12 @@ static bool vfio_load_bufs_thread_load_config(VFIODevice *vbasedev,
>>> ret = vfio_load_device_config_state(f_in, vbasedev);
>>> bql_unlock();
>>>
>>> + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
>>> + VFIO_DEVICE_STATE_ERROR, errp);
>>> + if (ret) {
>>> + return false;
>>> + }
>>> +
>>> if (ret < 0) {
>>> error_setg(errp, "%s: vfio_load_device_config_state() failed: %d",
>>> vbasedev->name, ret);
>>>
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index 4c06e3db93..a707d17a5b 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -737,6 +737,8 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>>> switch (data) {
>>> case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
>>> {
>>> + Error *local_err = NULL;
>>> +
>>> if (vfio_multifd_transfer_enabled(vbasedev)) {
>>> error_report("%s: got DEV_CONFIG_STATE in main migration "
>>> "channel but doing multifd transfer",
>>> @@ -744,7 +746,19 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>>> return -EINVAL;
>>> }
>>>
>>> - return vfio_load_device_config_state(f, opaque);
>>> + ret = vfio_load_device_config_state(f, opaque);
>>> + if (ret) {
>>> + return ret;
>>> + }
>>> +
>>> + ret = vfio_migration_set_state_or_reset(
>>> + vbasedev, VFIO_DEVICE_STATE_STOP, &local_err);
>>> + if (ret) {
>>> + error_report_err(local_err);
>>> + return ret;
>>> + }
>>> +
>>> + return 0;
>>> }
>>> case VFIO_MIG_FLAG_DEV_SETUP_STATE:
>>> {
>>>
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 10c216d25d..fd498c864d 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -91,6 +91,7 @@ enum mig_rp_message_type {
>>> MIG_RP_MSG_RECV_BITMAP, /* send recved_bitmap back to source */
>>> MIG_RP_MSG_RESUME_ACK, /* tell source that we are ready to resume */
>>> MIG_RP_MSG_SWITCHOVER_ACK, /* Tell source it's OK to do switchover */
>>> + MIG_RP_MSG_ERROR, /* Tell source that destination encountered an error */
>>>
>>> MIG_RP_MSG_MAX
>>> };
>>> @@ -884,6 +885,11 @@ process_incoming_migration_co(void *opaque)
>>> ret = qemu_loadvm_state(mis->from_src_file);
>>> mis->loadvm_co = NULL;
>>>
>>> + if (ret) {
>>> + migrate_send_rp_error(mis);
>>> + error_report("SENT RP ERROR");
>>> + }
>>> +
>>> trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
>>>
>>> ps = postcopy_state_get();
>>> @@ -1126,6 +1132,11 @@ bool migration_has_all_channels(void)
>>> return true;
>>> }
>>> +int migrate_send_rp_error(MigrationIncomingState *mis)
>>> +{
>>> + return migrate_send_rp_message(mis, MIG_RP_MSG_ERROR, 0, NULL);
>>> +}
>>> +
>>> int migrate_send_rp_switchover_ack(MigrationIncomingState *mis)
>>> {
>>> return migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, NULL);
>>> @@ -2614,6 +2625,10 @@ static void *source_return_path_thread(void *opaque)
>>> trace_source_return_path_thread_switchover_acked();
>>> break;
>>>
>>> + case MIG_RP_MSG_ERROR:
>>> + error_setg(&err, "DST indicated error");
>>> + goto out;
> If this is only a boolean, we can reuse RP_SHUT. Likely we could pass in
> an error to migration_incoming_state_destroy():
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 42a2a6e8f2..2ebba7838a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -441,7 +441,7 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
> }
> }
>
> -void migration_incoming_state_destroy(void)
> +void migration_incoming_state_destroy(bool has_error)
> {
> struct MigrationIncomingState *mis = migration_incoming_get_current();
>
> @@ -466,8 +466,11 @@ void migration_incoming_state_destroy(void)
> qemu_loadvm_state_cleanup(mis);
>
> if (mis->to_src_file) {
> - /* Tell source that we are done */
> - migrate_send_rp_shut(mis, qemu_file_get_error(mis->from_src_file) != 0);
> + /* Tell source whether load succeeded */
> + if (!has_error) {
> + has_error = qemu_file_get_error(mis->from_src_file) != 0;
> + }
> + migrate_send_rp_shut(mis, has_error);
> qemu_fclose(mis->to_src_file);
> mis->to_src_file = NULL;
> }
>
> Maybe it'll even work as late as process_incoming_migration_bh(), where
> vm_start() could fail - right now it couldn't, but if there'll be an error
> message reported upward then logically it can also set has_error=1 for the
> RP_SHUT message. Src QEMU relies on RP_SHUT message and retval=0 to quit
> src QEMU, otherwise QEMU should fail the migration and restart VM on src.
Ah, nice! I didn't notice RP_SHUT can take an error value.
Will try that, thanks!
>
>>> +
>>> default:
>>> break;
>>> }
>>> diff --git a/migration/migration.h b/migration/migration.h
>>> index 01329bf824..f11ff7a199 100644
>>> --- a/migration/migration.h
>>> +++ b/migration/migration.h
>>> @@ -553,6 +553,7 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>>> char *block_name);
>>> void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
>>> int migrate_send_rp_switchover_ack(MigrationIncomingState *mis);
>>> +int migrate_send_rp_error(MigrationIncomingState *mis);
>>>
>>> void dirty_bitmap_mig_before_vm_start(void);
>>> void dirty_bitmap_mig_cancel_outgoing(void);
>>>
>>>> I suggest you open an issue on :
>>>>
>>>> https://gitlab.com/qemu-project/qemu/-/issues/
>>>>
>>>> with a detailed description of your environment :
>>>>
>>>> Host HW, Host OS, QEMU version, QEMU command line, Guest OS, etc.
>>>>
>>>> A template is provided when a new issue is created.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> C.
>>>>
>>>>
>>>>
> --
> Peter Xu
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Question] VFIO migration will not be aborted in a corner scenario
2025-08-12 14:08 ` Avihai Horon
2025-08-12 14:34 ` Cédric Le Goater
@ 2025-08-12 14:56 ` Cédric Le Goater
2025-08-13 12:18 ` Kunkun Jiang via
2025-08-18 6:13 ` Avihai Horon
2025-08-13 12:18 ` Kunkun Jiang via
2 siblings, 2 replies; 11+ messages in thread
From: Cédric Le Goater @ 2025-08-12 14:56 UTC (permalink / raw)
To: Avihai Horon, Kunkun Jiang, Alex Williamson, Yishai Hadas
Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu
On 8/12/25 16:08, Avihai Horon wrote:
>
> On 11/08/2025 19:34, Cédric Le Goater wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello,
>>
>> + Avihai
>>
>> On 8/11/25 18:02, Kunkun Jiang wrote:
>>> Hi all,
>>>
>>> While testing VFIO migration, I encountered an corner scenario case:
>>> VFIO migration will not be aborted when the vfio device of dst-vm fails to transition from RESUMING to RUNNING state in vfio_vmstate_change.
>>>
>>> I saw the comments in the vfio_vmstate_change but I don't understand why no action is taken for this situation.
>>
>> There is error handling in vfio_vmstate_change() :
>>
>> /*
>> * Migration should be aborted in this case, but vm_state_notify()
>> * currently does not support reporting failures.
>> */
>> migration_file_set_error(ret, local_err);
>
> Hmm, I think this only sets the error on src. On dst we don't have MigrationState->to_dst_file, so we end up just reporting the error.
> But even if we did set it, no one is checking if there is a migration error after vm_start() is called in process_incoming_migration_bh().
>
>>
>>> Allowing the live migration process to continue could cause unrecoverable damage to the VM.
>
> What do you mean by unrecoverable damage to the VM?
> If RESUMING->RUNNING transition fails, would a VFIO reset recover the device and allow the VM to continue operation with damage limited only to the VFIO device?
>
>>> In this case, can we directly exit the dst-vm? Through the return-path mechanism, the src-vm can continue to run.
>>>
>>> Looking forward to your reply.
>>
> The straightforward solution, as you suggested, is to exit dst upon error in RESUMING->RUNNING transition and notify about it to src through the return-path.
> However, I am not sure if failing the migration after vm_start() on dst is a bit late (as we start vCPUs and do migration_block_activate, etc.).
>
> But I can think of another way to solve this, hopefully simpler.
> According to VFIO migration uAPI [1]:
> * RESUMING -> STOP
> * Leaving RESUMING terminates a data transfer session and indicates the
> * device should complete processing of the data delivered by write(). The
> * kernel migration driver should complete the incorporation of data written
> * to the data transfer FD into the device internal state and perform
> * final validity and consistency checking of the new device state. If the
> * user provided data is found to be incomplete, inconsistent, or otherwise
> * invalid, the migration driver must fail the SET_STATE ioctl and
> * optionally go to the ERROR state as described below.
>
> So, IIUC, we can add an explicit RESUMING->STOP transition [2] after the device config is loaded (which is the last data the device is expected to receive).
> If this transition fails, it means something was wrong with migration, and we can send src an error msg via return-path (and not continue to vm_start()).
>
> Maybe this approach is less complicated than the first one, and it will also work if src VM was paused prior migration.
> I already tested some POC and it seems to be working (at least with an artificial error i injected in RESUMING->STOP transition).
> Kunkun, can you apply the following diff [3] and check if this solves the issue?
>
> And in general, what do you think? Should we go with this approach or do you have other ideas?
>
> Thanks.
>
> [1] https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099
> [2] Today RESUMING->STOP is done implicitly by the VFIO driver as part of RESUMING->RUNNING transition.
> [3]
Avihai,
Could you please send an RFC patch with Peter and Fabiano in cc: ?
This will help to discuss the proposal and keep track of the issue.
Kunkun Jiang,
Could you please share details on your environment ?
Thanks,
C.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Question] VFIO migration will not be aborted in a corner scenario
2025-08-12 14:56 ` Cédric Le Goater
@ 2025-08-13 12:18 ` Kunkun Jiang via
2025-08-18 6:13 ` Avihai Horon
1 sibling, 0 replies; 11+ messages in thread
From: Kunkun Jiang via @ 2025-08-13 12:18 UTC (permalink / raw)
To: Cédric Le Goater, Avihai Horon, Alex Williamson,
Yishai Hadas
Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu
Hi Cédric,
On 2025/8/12 22:56, Cédric Le Goater wrote:
> On 8/12/25 16:08, Avihai Horon wrote:
>>
>> On 11/08/2025 19:34, Cédric Le Goater wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Hello,
>>>
>>> + Avihai
>>>
>>> On 8/11/25 18:02, Kunkun Jiang wrote:
>>>> Hi all,
>>>>
>>>> While testing VFIO migration, I encountered an corner scenario case:
>>>> VFIO migration will not be aborted when the vfio device of dst-vm
>>>> fails to transition from RESUMING to RUNNING state in
>>>> vfio_vmstate_change.
>>>>
>>>> I saw the comments in the vfio_vmstate_change but I don't understand
>>>> why no action is taken for this situation.
>>>
>>> There is error handling in vfio_vmstate_change() :
>>>
>>> /*
>>> * Migration should be aborted in this case, but
>>> vm_state_notify()
>>> * currently does not support reporting failures.
>>> */
>>> migration_file_set_error(ret, local_err);
>>
>> Hmm, I think this only sets the error on src. On dst we don't have
>> MigrationState->to_dst_file, so we end up just reporting the error.
>> But even if we did set it, no one is checking if there is a migration
>> error after vm_start() is called in process_incoming_migration_bh().
>>
>>>
>>>> Allowing the live migration process to continue could cause
>>>> unrecoverable damage to the VM.
>>
>> What do you mean by unrecoverable damage to the VM?
>> If RESUMING->RUNNING transition fails, would a VFIO reset recover the
>> device and allow the VM to continue operation with damage limited only
>> to the VFIO device?
>>
>>>> In this case, can we directly exit the dst-vm? Through the
>>>> return-path mechanism, the src-vm can continue to run.
>>>>
>>>> Looking forward to your reply.
>>>
>> The straightforward solution, as you suggested, is to exit dst upon
>> error in RESUMING->RUNNING transition and notify about it to src
>> through the return-path.
>> However, I am not sure if failing the migration after vm_start() on
>> dst is a bit late (as we start vCPUs and do migration_block_activate,
>> etc.).
>>
>> But I can think of another way to solve this, hopefully simpler.
>> According to VFIO migration uAPI [1]:
>> * RESUMING -> STOP
>> * Leaving RESUMING terminates a data transfer session and
>> indicates the
>> * device should complete processing of the data delivered by
>> write(). The
>> * kernel migration driver should complete the incorporation of
>> data written
>> * to the data transfer FD into the device internal state and perform
>> * final validity and consistency checking of the new device state.
>> If the
>> * user provided data is found to be incomplete, inconsistent, or
>> otherwise
>> * invalid, the migration driver must fail the SET_STATE ioctl and
>> * optionally go to the ERROR state as described below.
>>
>> So, IIUC, we can add an explicit RESUMING->STOP transition [2] after
>> the device config is loaded (which is the last data the device is
>> expected to receive).
>> If this transition fails, it means something was wrong with migration,
>> and we can send src an error msg via return-path (and not continue to
>> vm_start()).
>>
>> Maybe this approach is less complicated than the first one, and it
>> will also work if src VM was paused prior migration.
>> I already tested some POC and it seems to be working (at least with an
>> artificial error i injected in RESUMING->STOP transition).
>> Kunkun, can you apply the following diff [3] and check if this solves
>> the issue?
>>
>> And in general, what do you think? Should we go with this approach or
>> do you have other ideas?
>>
>> Thanks.
>>
>> [1]
>> https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099
>>
>> [2] Today RESUMING->STOP is done implicitly by the VFIO driver as part
>> of RESUMING->RUNNING transition.
>> [3]
>
> Avihai,
>
> Could you please send an RFC patch with Peter and Fabiano in cc: ?
> This will help to discuss the proposal and keep track of the issue.
>
>
> Kunkun Jiang,
>
> Could you please share details on your environment ?
Sorry for the late reply.My test scenario is this:
Server: Kunpeng-920
VFUI device: SEC, a Kunpeng hardware accelerator
Host OS: openEuler 24.03, kernel 6.6, qemu 8.2
Guest OS: openEuler 24.03
Test steps:
1. Start a VM and run a task using SEC inside the VM
2. Start migration and inject an error into the SEC used by dst-VM
3. The SEC used by dst-VM will be reset. So RESUMING->RUNNING transition
will fail with probability.(The timing has to be just right)
>
> Thanks,
>
> C.
>
>
> .
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Question] VFIO migration will not be aborted in a corner scenario
2025-08-12 14:56 ` Cédric Le Goater
2025-08-13 12:18 ` Kunkun Jiang via
@ 2025-08-18 6:13 ` Avihai Horon
1 sibling, 0 replies; 11+ messages in thread
From: Avihai Horon @ 2025-08-18 6:13 UTC (permalink / raw)
To: Cédric Le Goater, Kunkun Jiang, Alex Williamson,
Yishai Hadas
Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu
On 12/08/2025 17:56, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> On 8/12/25 16:08, Avihai Horon wrote:
>>
>> On 11/08/2025 19:34, Cédric Le Goater wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Hello,
>>>
>>> + Avihai
>>>
>>> On 8/11/25 18:02, Kunkun Jiang wrote:
>>>> Hi all,
>>>>
>>>> While testing VFIO migration, I encountered an corner scenario case:
>>>> VFIO migration will not be aborted when the vfio device of dst-vm
>>>> fails to transition from RESUMING to RUNNING state in
>>>> vfio_vmstate_change.
>>>>
>>>> I saw the comments in the vfio_vmstate_change but I don't
>>>> understand why no action is taken for this situation.
>>>
>>> There is error handling in vfio_vmstate_change() :
>>>
>>> /*
>>> * Migration should be aborted in this case, but
>>> vm_state_notify()
>>> * currently does not support reporting failures.
>>> */
>>> migration_file_set_error(ret, local_err);
>>
>> Hmm, I think this only sets the error on src. On dst we don't have
>> MigrationState->to_dst_file, so we end up just reporting the error.
>> But even if we did set it, no one is checking if there is a migration
>> error after vm_start() is called in process_incoming_migration_bh().
>>
>>>
>>>> Allowing the live migration process to continue could cause
>>>> unrecoverable damage to the VM.
>>
>> What do you mean by unrecoverable damage to the VM?
>> If RESUMING->RUNNING transition fails, would a VFIO reset recover the
>> device and allow the VM to continue operation with damage limited
>> only to the VFIO device?
>>
>>>> In this case, can we directly exit the dst-vm? Through the
>>>> return-path mechanism, the src-vm can continue to run.
>>>>
>>>> Looking forward to your reply.
>>>
>> The straightforward solution, as you suggested, is to exit dst upon
>> error in RESUMING->RUNNING transition and notify about it to src
>> through the return-path.
>> However, I am not sure if failing the migration after vm_start() on
>> dst is a bit late (as we start vCPUs and do migration_block_activate,
>> etc.).
>>
>> But I can think of another way to solve this, hopefully simpler.
>> According to VFIO migration uAPI [1]:
>> * RESUMING -> STOP
>> * Leaving RESUMING terminates a data transfer session and
>> indicates the
>> * device should complete processing of the data delivered by
>> write(). The
>> * kernel migration driver should complete the incorporation of
>> data written
>> * to the data transfer FD into the device internal state and perform
>> * final validity and consistency checking of the new device
>> state. If the
>> * user provided data is found to be incomplete, inconsistent, or
>> otherwise
>> * invalid, the migration driver must fail the SET_STATE ioctl and
>> * optionally go to the ERROR state as described below.
>>
>> So, IIUC, we can add an explicit RESUMING->STOP transition [2] after
>> the device config is loaded (which is the last data the device is
>> expected to receive).
>> If this transition fails, it means something was wrong with
>> migration, and we can send src an error msg via return-path (and not
>> continue to vm_start()).
>>
>> Maybe this approach is less complicated than the first one, and it
>> will also work if src VM was paused prior migration.
>> I already tested some POC and it seems to be working (at least with
>> an artificial error i injected in RESUMING->STOP transition).
>> Kunkun, can you apply the following diff [3] and check if this solves
>> the issue?
>>
>> And in general, what do you think? Should we go with this approach or
>> do you have other ideas?
>>
>> Thanks.
>>
>> [1]
>> https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099
>> [2] Today RESUMING->STOP is done implicitly by the VFIO driver as
>> part of RESUMING->RUNNING transition.
>> [3]
>
> Avihai,
>
> Could you please send an RFC patch with Peter and Fabiano in cc: ?
> This will help to discuss the proposal and keep track of the issue.
>
Yes, of course. But I am a bit busy with other stuff, so it may take me
a few days to do that.
Of course, If anyone would like, feel free to send your own proposal.
Thanks.
>
> Kunkun Jiang,
>
> Could you please share details on your environment ?
>
> Thanks,
>
> C.
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Question] VFIO migration will not be aborted in a corner scenario
2025-08-12 14:08 ` Avihai Horon
2025-08-12 14:34 ` Cédric Le Goater
2025-08-12 14:56 ` Cédric Le Goater
@ 2025-08-13 12:18 ` Kunkun Jiang via
2025-08-18 6:44 ` Avihai Horon
2 siblings, 1 reply; 11+ messages in thread
From: Kunkun Jiang via @ 2025-08-13 12:18 UTC (permalink / raw)
To: Avihai Horon, Cédric Le Goater, Alex Williamson,
Yishai Hadas
Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu
Hi Avihai,
On 2025/8/12 22:08, Avihai Horon wrote:
>
> On 11/08/2025 19:34, Cédric Le Goater wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello,
>>
>> + Avihai
>>
>> On 8/11/25 18:02, Kunkun Jiang wrote:
>>> Hi all,
>>>
>>> While testing VFIO migration, I encountered an corner scenario case:
>>> VFIO migration will not be aborted when the vfio device of dst-vm
>>> fails to transition from RESUMING to RUNNING state in
>>> vfio_vmstate_change.
>>>
>>> I saw the comments in the vfio_vmstate_change but I don't understand
>>> why no action is taken for this situation.
>>
>> There is error handling in vfio_vmstate_change() :
>>
>> /*
>> * Migration should be aborted in this case, but
>> vm_state_notify()
>> * currently does not support reporting failures.
>> */
>> migration_file_set_error(ret, local_err);
>
> Hmm, I think this only sets the error on src. On dst we don't have
> MigrationState->to_dst_file, so we end up just reporting the erro > But even if we did set it, no one is checking if there is a migration
> error after vm_start() is called in process_incoming_migration_bh().
Yes, that's what I mean.
>
>>
>>> Allowing the live migration process to continue could cause
>>> unrecoverable damage to the VM.
>
> What do you mean by unrecoverable damage to the VM?
> If RESUMING->RUNNING transition fails, would a VFIO reset recover the
> device and allow the VM to continue operation with damage limited only
> to the VFIO device?
I didn't express myself clearly, let me explain again.
In my test, I ran a task inside the VM that was using this vfio device.
In this corner scenario, the task is aborted immediately and the VM is
still running normally. I mean it is unacceptable that the task is
aborted directly.
>
>>> In this case, can we directly exit the dst-vm? Through the
>>> return-path mechanism, the src-vm can continue to run.
>>>
>>> Looking forward to your reply.
>>
> The straightforward solution, as you suggested, is to exit dst upon
> error in RESUMING->RUNNING transition and notify about it to src through
> the return-path.
> However, I am not sure if failing the migration after vm_start() on dst
> is a bit late (as we start vCPUs and do migration_block_activate, etc.).
I think vCPUs have not started yet. vCPUs will be started only after the
all callbacks of vm_state_notify are executed.
If there are any problems with my analysis, please point them out.
>
> But I can think of another way to solve this, hopefully simpler.
> According to VFIO migration uAPI [1]:
> * RESUMING -> STOP
> * Leaving RESUMING terminates a data transfer session and indicates the
> * device should complete processing of the data delivered by
> write(). The
> * kernel migration driver should complete the incorporation of data
> written
> * to the data transfer FD into the device internal state and perform
> * final validity and consistency checking of the new device state.
> If the
> * user provided data is found to be incomplete, inconsistent, or
> otherwise
> * invalid, the migration driver must fail the SET_STATE ioctl and
> * optionally go to the ERROR state as described below.
>
> So, IIUC, we can add an explicit RESUMING->STOP transition [2] after the
> device config is loaded (which is the last data the device is expected
> to receive).
> If this transition fails, it means something was wrong with migration,
> and we can send src an error msg via return-path (and not continue to
> vm_start()).
>
> Maybe this approach is less complicated than the first one, and it will
> also work if src VM was paused prior migration.
> I already tested some POC and it seems to be working (at least with an
> artificial error i injected in RESUMING->STOP transition).
> Kunkun, can you apply the following diff [3] and check if this solves
> the issue?
Ok, I'll try it.
>
> And in general, what do you think? Should we go with this approach or do
> you have other ideas?
My current approach is rather crude. As I said above, I think vcpu has
not started yet, so I changed it like this.
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 4c06e3db93..70ccb706c6 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -878,7 +878,7 @@ static void vfio_vmstate_change_prepare(void
*opaque, bool running,
static void vfio_vmstate_change(void *opaque, bool running, RunState
state)
{
VFIODevice *vbasedev = opaque;
- enum vfio_device_mig_state new_state;
+ enum vfio_device_mig_state new_state, pre_state;
Error *local_err = NULL;
int ret;
@@ -899,6 +899,10 @@ static void vfio_vmstate_change(void *opaque, bool
running, RunState state)
* currently does not support reporting failures.
*/
migration_file_set_error(ret, local_err);
+
+ if (pre_state == VFIO_DEVICE_STATE_RESUMING) {
+ exit(EXIT_FAILURE);
+ }
}
>
> Thanks.
>
> [1]
> https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099
>
> [2] Today RESUMING->STOP is done implicitly by the VFIO driver as part
> of RESUMING->RUNNING transition.
> [3]
>
> diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
> index e4785031a7..66f8461f02 100644
> --- a/hw/vfio/migration-multifd.c
> +++ b/hw/vfio/migration-multifd.c
> @@ -267,6 +267,12 @@ static bool
> vfio_load_bufs_thread_load_config(VFIODevice *vbasedev,
> ret = vfio_load_device_config_state(f_in, vbasedev);
> bql_unlock();
>
> + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
> + VFIO_DEVICE_STATE_ERROR, errp);
> + if (ret) {
> + return false;
> + }
> +
> if (ret < 0) {
> error_setg(errp, "%s: vfio_load_device_config_state() failed:
> %d",
> vbasedev->name, ret);
>
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 4c06e3db93..a707d17a5b 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -737,6 +737,8 @@ static int vfio_load_state(QEMUFile *f, void
> *opaque, int version_id)
> switch (data) {
> case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
> {
> + Error *local_err = NULL;
> +
> if (vfio_multifd_transfer_enabled(vbasedev)) {
> error_report("%s: got DEV_CONFIG_STATE in main
> migration "
> "channel but doing multifd transfer",
> @@ -744,7 +746,19 @@ static int vfio_load_state(QEMUFile *f, void
> *opaque, int version_id)
> return -EINVAL;
> }
>
> - return vfio_load_device_config_state(f, opaque);
> + ret = vfio_load_device_config_state(f, opaque);
> + if (ret) {
> + return ret;
> + }
> +
> + ret = vfio_migration_set_state_or_reset(
> + vbasedev, VFIO_DEVICE_STATE_STOP, &local_err);
> + if (ret) {
> + error_report_err(local_err);
> + return ret;
> + }
> +
> + return 0;
> }
> case VFIO_MIG_FLAG_DEV_SETUP_STATE:
> {
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 10c216d25d..fd498c864d 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -91,6 +91,7 @@ enum mig_rp_message_type {
> MIG_RP_MSG_RECV_BITMAP, /* send recved_bitmap back to source */
> MIG_RP_MSG_RESUME_ACK, /* tell source that we are ready to
> resume */
> MIG_RP_MSG_SWITCHOVER_ACK, /* Tell source it's OK to do switchover */
> + MIG_RP_MSG_ERROR, /* Tell source that destination encountered an
> error */
>
> MIG_RP_MSG_MAX
> };
> @@ -884,6 +885,11 @@ process_incoming_migration_co(void *opaque)
> ret = qemu_loadvm_state(mis->from_src_file);
> mis->loadvm_co = NULL;
>
> + if (ret) {
> + migrate_send_rp_error(mis);
> + error_report("SENT RP ERROR");
> + }
> +
> trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
>
> ps = postcopy_state_get();
> @@ -1126,6 +1132,11 @@ bool migration_has_all_channels(void)
> return true;
> }
> +int migrate_send_rp_error(MigrationIncomingState *mis)
> +{
> + return migrate_send_rp_message(mis, MIG_RP_MSG_ERROR, 0, NULL);
> +}
> +
> int migrate_send_rp_switchover_ack(MigrationIncomingState *mis)
> {
> return migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0,
> NULL);
> @@ -2614,6 +2625,10 @@ static void *source_return_path_thread(void *opaque)
> trace_source_return_path_thread_switchover_acked();
> break;
>
> + case MIG_RP_MSG_ERROR:
> + error_setg(&err, "DST indicated error");
> + goto out;
> +
> default:
> break;
> }
> diff --git a/migration/migration.h b/migration/migration.h
> index 01329bf824..f11ff7a199 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -553,6 +553,7 @@ void
> migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
> char *block_name);
> void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t
> value);
> int migrate_send_rp_switchover_ack(MigrationIncomingState *mis);
> +int migrate_send_rp_error(MigrationIncomingState *mis);
>
> void dirty_bitmap_mig_before_vm_start(void);
> void dirty_bitmap_mig_cancel_outgoing(void);
>
>> I suggest you open an issue on :
>>
>> https://gitlab.com/qemu-project/qemu/-/issues/
>>
>> with a detailed description of your environment :
>>
>> Host HW, Host OS, QEMU version, QEMU command line, Guest OS, etc.
>>
>> A template is provided when a new issue is created.
>>
>>
>> Thanks,
>>
>> C.
>>
>>
>>
> .
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [Question] VFIO migration will not be aborted in a corner scenario
2025-08-13 12:18 ` Kunkun Jiang via
@ 2025-08-18 6:44 ` Avihai Horon
0 siblings, 0 replies; 11+ messages in thread
From: Avihai Horon @ 2025-08-18 6:44 UTC (permalink / raw)
To: Kunkun Jiang, Cédric Le Goater, Alex Williamson,
Yishai Hadas
Cc: open list:All patches CC here, wanghaibin.wang, Zenghui Yu
On 13/08/2025 15:18, Kunkun Jiang wrote:
> External email: Use caution opening links or attachments
>
>
> Hi Avihai,
>
> On 2025/8/12 22:08, Avihai Horon wrote:
>>
>> On 11/08/2025 19:34, Cédric Le Goater wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Hello,
>>>
>>> + Avihai
>>>
>>> On 8/11/25 18:02, Kunkun Jiang wrote:
>>>> Hi all,
>>>>
>>>> While testing VFIO migration, I encountered an corner scenario case:
>>>> VFIO migration will not be aborted when the vfio device of dst-vm
>>>> fails to transition from RESUMING to RUNNING state in
>>>> vfio_vmstate_change.
>>>>
>>>> I saw the comments in the vfio_vmstate_change but I don't understand
>>>> why no action is taken for this situation.
>>>
>>> There is error handling in vfio_vmstate_change() :
>>>
>>> /*
>>> * Migration should be aborted in this case, but
>>> vm_state_notify()
>>> * currently does not support reporting failures.
>>> */
>>> migration_file_set_error(ret, local_err);
>>
>> Hmm, I think this only sets the error on src. On dst we don't have
>> MigrationState->to_dst_file, so we end up just reporting the erro >
>> But even if we did set it, no one is checking if there is a migration
>> error after vm_start() is called in process_incoming_migration_bh().
> Yes, that's what I mean.
>>
>>>
>>>> Allowing the live migration process to continue could cause
>>>> unrecoverable damage to the VM.
>>
>> What do you mean by unrecoverable damage to the VM?
>> If RESUMING->RUNNING transition fails, would a VFIO reset recover the
>> device and allow the VM to continue operation with damage limited only
>> to the VFIO device?
> I didn't express myself clearly, let me explain again.
> In my test, I ran a task inside the VM that was using this vfio device.
> In this corner scenario, the task is aborted immediately and the VM is
> still running normally. I mean it is unacceptable that the task is
> aborted directly.
Ah I understand.
>>
>>>> In this case, can we directly exit the dst-vm? Through the
>>>> return-path mechanism, the src-vm can continue to run.
>>>>
>>>> Looking forward to your reply.
>>>
>> The straightforward solution, as you suggested, is to exit dst upon
>> error in RESUMING->RUNNING transition and notify about it to src through
>> the return-path.
>> However, I am not sure if failing the migration after vm_start() on dst
>> is a bit late (as we start vCPUs and do migration_block_activate, etc.).
> I think vCPUs have not started yet. vCPUs will be started only after the
> all callbacks of vm_state_notify are executed.
Yes, in that case need to refactor vm_prepare_start/vm_state_notify to
handle errors. May be a bit involved, need to further look into that.
Thanks.
> If there are any problems with my analysis, please point them out.
>>
>> But I can think of another way to solve this, hopefully simpler.
>> According to VFIO migration uAPI [1]:
>> * RESUMING -> STOP
>> * Leaving RESUMING terminates a data transfer session and
>> indicates the
>> * device should complete processing of the data delivered by
>> write(). The
>> * kernel migration driver should complete the incorporation of data
>> written
>> * to the data transfer FD into the device internal state and perform
>> * final validity and consistency checking of the new device state.
>> If the
>> * user provided data is found to be incomplete, inconsistent, or
>> otherwise
>> * invalid, the migration driver must fail the SET_STATE ioctl and
>> * optionally go to the ERROR state as described below.
>>
>> So, IIUC, we can add an explicit RESUMING->STOP transition [2] after the
>> device config is loaded (which is the last data the device is expected
>> to receive).
>> If this transition fails, it means something was wrong with migration,
>> and we can send src an error msg via return-path (and not continue to
>> vm_start()).
>>
>> Maybe this approach is less complicated than the first one, and it will
>> also work if src VM was paused prior migration.
>> I already tested some POC and it seems to be working (at least with an
>> artificial error i injected in RESUMING->STOP transition).
>> Kunkun, can you apply the following diff [3] and check if this solves
>> the issue?
> Ok, I'll try it.
>>
>> And in general, what do you think? Should we go with this approach or do
>> you have other ideas?
> My current approach is rather crude. As I said above, I think vcpu has
> not started yet, so I changed it like this.
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 4c06e3db93..70ccb706c6 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -878,7 +878,7 @@ static void vfio_vmstate_change_prepare(void
> *opaque, bool running,
> static void vfio_vmstate_change(void *opaque, bool running, RunState
> state)
> {
> VFIODevice *vbasedev = opaque;
> - enum vfio_device_mig_state new_state;
> + enum vfio_device_mig_state new_state, pre_state;
> Error *local_err = NULL;
> int ret;
>
> @@ -899,6 +899,10 @@ static void vfio_vmstate_change(void *opaque, bool
> running, RunState state)
> * currently does not support reporting failures.
> */
> migration_file_set_error(ret, local_err);
> +
> + if (pre_state == VFIO_DEVICE_STATE_RESUMING) {
> + exit(EXIT_FAILURE);
> + }
> }
>>
>> Thanks.
>>
>> [1]
>> https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/vfio.h#L1099
>>
>>
>> [2] Today RESUMING->STOP is done implicitly by the VFIO driver as part
>> of RESUMING->RUNNING transition.
>> [3]
>>
>> diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
>> index e4785031a7..66f8461f02 100644
>> --- a/hw/vfio/migration-multifd.c
>> +++ b/hw/vfio/migration-multifd.c
>> @@ -267,6 +267,12 @@ static bool
>> vfio_load_bufs_thread_load_config(VFIODevice *vbasedev,
>> ret = vfio_load_device_config_state(f_in, vbasedev);
>> bql_unlock();
>>
>> + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
>> + VFIO_DEVICE_STATE_ERROR, errp);
>> + if (ret) {
>> + return false;
>> + }
>> +
>> if (ret < 0) {
>> error_setg(errp, "%s: vfio_load_device_config_state() failed:
>> %d",
>> vbasedev->name, ret);
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 4c06e3db93..a707d17a5b 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -737,6 +737,8 @@ static int vfio_load_state(QEMUFile *f, void
>> *opaque, int version_id)
>> switch (data) {
>> case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
>> {
>> + Error *local_err = NULL;
>> +
>> if (vfio_multifd_transfer_enabled(vbasedev)) {
>> error_report("%s: got DEV_CONFIG_STATE in main
>> migration "
>> "channel but doing multifd transfer",
>> @@ -744,7 +746,19 @@ static int vfio_load_state(QEMUFile *f, void
>> *opaque, int version_id)
>> return -EINVAL;
>> }
>>
>> - return vfio_load_device_config_state(f, opaque);
>> + ret = vfio_load_device_config_state(f, opaque);
>> + if (ret) {
>> + return ret;
>> + }
>> +
>> + ret = vfio_migration_set_state_or_reset(
>> + vbasedev, VFIO_DEVICE_STATE_STOP, &local_err);
>> + if (ret) {
>> + error_report_err(local_err);
>> + return ret;
>> + }
>> +
>> + return 0;
>> }
>> case VFIO_MIG_FLAG_DEV_SETUP_STATE:
>> {
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 10c216d25d..fd498c864d 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -91,6 +91,7 @@ enum mig_rp_message_type {
>> MIG_RP_MSG_RECV_BITMAP, /* send recved_bitmap back to source */
>> MIG_RP_MSG_RESUME_ACK, /* tell source that we are ready to
>> resume */
>> MIG_RP_MSG_SWITCHOVER_ACK, /* Tell source it's OK to do
>> switchover */
>> + MIG_RP_MSG_ERROR, /* Tell source that destination encountered an
>> error */
>>
>> MIG_RP_MSG_MAX
>> };
>> @@ -884,6 +885,11 @@ process_incoming_migration_co(void *opaque)
>> ret = qemu_loadvm_state(mis->from_src_file);
>> mis->loadvm_co = NULL;
>>
>> + if (ret) {
>> + migrate_send_rp_error(mis);
>> + error_report("SENT RP ERROR");
>> + }
>> +
>> trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
>>
>> ps = postcopy_state_get();
>> @@ -1126,6 +1132,11 @@ bool migration_has_all_channels(void)
>> return true;
>> }
>> +int migrate_send_rp_error(MigrationIncomingState *mis)
>> +{
>> + return migrate_send_rp_message(mis, MIG_RP_MSG_ERROR, 0, NULL);
>> +}
>> +
>> int migrate_send_rp_switchover_ack(MigrationIncomingState *mis)
>> {
>> return migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0,
>> NULL);
>> @@ -2614,6 +2625,10 @@ static void *source_return_path_thread(void
>> *opaque)
>> trace_source_return_path_thread_switchover_acked();
>> break;
>>
>> + case MIG_RP_MSG_ERROR:
>> + error_setg(&err, "DST indicated error");
>> + goto out;
>> +
>> default:
>> break;
>> }
>> diff --git a/migration/migration.h b/migration/migration.h
>> index 01329bf824..f11ff7a199 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -553,6 +553,7 @@ void
>> migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>> char *block_name);
>> void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t
>> value);
>> int migrate_send_rp_switchover_ack(MigrationIncomingState *mis);
>> +int migrate_send_rp_error(MigrationIncomingState *mis);
>>
>> void dirty_bitmap_mig_before_vm_start(void);
>> void dirty_bitmap_mig_cancel_outgoing(void);
>>
>>> I suggest you open an issue on :
>>>
>>> https://gitlab.com/qemu-project/qemu/-/issues/
>>>
>>> with a detailed description of your environment :
>>>
>>> Host HW, Host OS, QEMU version, QEMU command line, Guest OS, etc.
>>>
>>> A template is provided when a new issue is created.
>>>
>>>
>>> Thanks,
>>>
>>> C.
>>>
>>>
>>>
>> .
^ permalink raw reply [flat|nested] 11+ messages in thread