* [PATCH 0/5] migration: Notifier fixes for 11.0
@ 2026-01-22 23:03 Peter Xu
2026-01-22 23:03 ` [PATCH 1/5] migration: Add a tracepoint for invoking migration notifiers Peter Xu
` (5 more replies)
0 siblings, 6 replies; 18+ messages in thread
From: Peter Xu @ 2026-01-22 23:03 UTC (permalink / raw)
To: qemu-devel
Cc: Juraj Marcin, Stefan Hajnoczi, Fabiano Rosas, Prasad Pandit,
peterx
CI: https://gitlab.com/peterx/qemu/-/pipelines/2280356561
Two major goals for this small series:
- Fix postcopy issue where DONE and FAILED notifiers will be invoked twice
- Move FAILED notifier to be before vm_start() if the failure happens
during switchover (where we will stop the VM first)
The 2nd goal will be needed by Stefan's ongoing work on block persistent
reservations, where a fallback should be required on src to happen before
vm_start(). Instead of introducing another FAILED_BEFORE_START, this
patchset should make FAILED work instead.
Patch 1 adds a tracepoint for me to verify this fix.
Patch 2-3 are the real changes of above two.
Patch 3-4 are some cleanups alone the context that we can do, hence
attached at the end.
More details in commit logs individually. Comments welcomed, thanks.
Peter Xu (5):
migration: Add a tracepoint for invoking migration notifiers
migration: Fix double notification of DONE/FAIL for postcopy
migration: Notify migration FAILED before starting VM
migration: Drop explicit block activation in postcopy fail path
migration: Rename MIG_EVENT_PRECOPY_* to MIG_EVENT_*
include/migration/misc.h | 15 ++++++++-------
hw/intc/arm_gicv3_kvm.c | 2 +-
hw/net/virtio-net.c | 4 ++--
hw/vfio/cpr-legacy.c | 2 +-
hw/vfio/cpr.c | 8 ++++----
hw/vfio/migration.c | 4 ++--
migration/cpr-exec.c | 6 +++---
migration/migration.c | 30 +++++++++++++++++++++---------
net/vhost-vdpa.c | 4 ++--
ui/spice-core.c | 7 ++++---
migration/trace-events | 1 +
11 files changed, 49 insertions(+), 34 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/5] migration: Add a tracepoint for invoking migration notifiers
2026-01-22 23:03 [PATCH 0/5] migration: Notifier fixes for 11.0 Peter Xu
@ 2026-01-22 23:03 ` Peter Xu
2026-01-23 12:25 ` Fabiano Rosas
2026-01-22 23:03 ` [PATCH 2/5] migration: Fix double notification of DONE/FAIL for postcopy Peter Xu
` (4 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2026-01-22 23:03 UTC (permalink / raw)
To: qemu-devel
Cc: Juraj Marcin, Stefan Hajnoczi, Fabiano Rosas, Prasad Pandit,
peterx
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 2 ++
migration/trace-events | 1 +
2 files changed, 3 insertions(+)
diff --git a/migration/migration.c b/migration/migration.c
index 1bcde301f7..47d9189aaf 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1706,6 +1706,8 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type,
GSList *elem, *next;
int ret;
+ trace_migration_call_notifiers(type);
+
e.type = type;
for (elem = migration_state_notifiers[mode]; elem; elem = next) {
diff --git a/migration/trace-events b/migration/trace-events
index bf11b62b17..f437f0a784 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -198,6 +198,7 @@ process_incoming_migration_co_end(int ret) "ret=%d"
process_incoming_migration_co_postcopy_end_main(void) ""
postcopy_preempt_enabled(bool value) "%d"
migration_precopy_complete(void) ""
+migration_call_notifiers(int type) "type=%d"
# migration-stats
migration_transferred_bytes(uint64_t qemu_file, uint64_t multifd, uint64_t rdma) "qemu_file %" PRIu64 " multifd %" PRIu64 " RDMA %" PRIu64
--
2.50.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/5] migration: Fix double notification of DONE/FAIL for postcopy
2026-01-22 23:03 [PATCH 0/5] migration: Notifier fixes for 11.0 Peter Xu
2026-01-22 23:03 ` [PATCH 1/5] migration: Add a tracepoint for invoking migration notifiers Peter Xu
@ 2026-01-22 23:03 ` Peter Xu
2026-01-23 12:52 ` Fabiano Rosas
2026-01-22 23:03 ` [PATCH 3/5] migration: Notify migration FAILED before starting VM Peter Xu
` (3 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2026-01-22 23:03 UTC (permalink / raw)
To: qemu-devel
Cc: Juraj Marcin, Stefan Hajnoczi, Fabiano Rosas, Prasad Pandit,
peterx, Marc-André Lureau, Dr. David Alan Gilbert
Migration notifiers will notify at any of three places: (1) SETUP
phase, (2) migration completes, (3) migration fails.
There's actually a special case for spice: one can refer to
b82fc321bf ("Postcopy+spice: Pass spice migration data earlier"). It
doesn't need another 4th event because in commit 9d9babf78d ("migration:
MigrationEvent for notifiers") we merged it together with the DONE event.
The merge makes some sense if we treat "switchover" of postcopy as "DONE",
however that also means for postcopy we'll notify DONE twice.. The other
one at the end of postcopy when migration_cleanup().
In reality, the current code base will also notify FAILED for postcopy
twice. It's because an (maybe accidental) change in commit
4af667f87c ("migration: notifier error checking").
First of all, we still need that notification when switchover as stated in
Dave's commit, however that's only needed for spice. To fix it, introduce
POSTCOPY_START event to differenciate it from DONE. Use that instead in
postcopy_start(). Then spice will need to capture this event too.
Then we remove the extra FAILED notification in postcopy_start().
If one wonder if other DONE users should also monitor POSTCOPY_START
event.. We have two more DONE users:
- kvm_arm_gicv3_notifier
- cpr_exec_notifier
Both of them do not need a notification for POSTCOPY_START, but only when
migration completed. Actually, both of them are used in CPR, which doesn't
support postcopy.
I didn't attach Fixes: because I am not aware of any real bug on such
double reporting. I'm wildly guessing the 2nd notify might be silently
ignored in many cases. However this is still worth fixing.
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Dr. David Alan Gilbert <dave@treblig.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/migration/misc.h | 1 +
migration/migration.c | 3 +--
ui/spice-core.c | 3 ++-
3 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/migration/misc.h b/include/migration/misc.h
index e26d418a6e..b002466e10 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -63,6 +63,7 @@ typedef enum MigrationEventType {
MIG_EVENT_PRECOPY_SETUP,
MIG_EVENT_PRECOPY_DONE,
MIG_EVENT_PRECOPY_FAILED,
+ MIG_EVENT_POSTCOPY_START,
MIG_EVENT_MAX
} MigrationEventType;
diff --git a/migration/migration.c b/migration/migration.c
index 47d9189aaf..91775f8472 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2857,7 +2857,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
* at the transition to postcopy and after the device state; in particular
* spice needs to trigger a transition now
*/
- migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE, NULL);
+ migration_call_notifiers(ms, MIG_EVENT_POSTCOPY_START, NULL);
migration_downtime_end(ms);
@@ -2906,7 +2906,6 @@ fail:
migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
}
migration_block_activate(NULL);
- migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
bql_unlock();
return -1;
}
diff --git a/ui/spice-core.c b/ui/spice-core.c
index 8a6050f4ae..ce3c2954e3 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -585,7 +585,8 @@ static int migration_state_notifier(NotifierWithReturn *notifier,
if (e->type == MIG_EVENT_PRECOPY_SETUP) {
spice_server_migrate_start(spice_server);
- } else if (e->type == MIG_EVENT_PRECOPY_DONE) {
+ } else if (e->type == MIG_EVENT_PRECOPY_DONE ||
+ e->type == MIG_EVENT_POSTCOPY_START) {
spice_server_migrate_end(spice_server, true);
spice_have_target_host = false;
} else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
--
2.50.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/5] migration: Notify migration FAILED before starting VM
2026-01-22 23:03 [PATCH 0/5] migration: Notifier fixes for 11.0 Peter Xu
2026-01-22 23:03 ` [PATCH 1/5] migration: Add a tracepoint for invoking migration notifiers Peter Xu
2026-01-22 23:03 ` [PATCH 2/5] migration: Fix double notification of DONE/FAIL for postcopy Peter Xu
@ 2026-01-22 23:03 ` Peter Xu
2026-01-23 12:59 ` Fabiano Rosas
2026-01-22 23:03 ` [PATCH 4/5] migration: Drop explicit block activation in postcopy fail path Peter Xu
` (2 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2026-01-22 23:03 UTC (permalink / raw)
To: qemu-devel
Cc: Juraj Marcin, Stefan Hajnoczi, Fabiano Rosas, Prasad Pandit,
peterx, Cédric Le Goater, Marc-André Lureau
Devices may opt-in migration FAILED notifiers to be invoked when migration
fails. Currently, the notifications happen in migration_cleanup(). It is
normally fine, but maybe not ideal if there's dependency of the fallback
v.s. VM starts.
This patch moves the FAILED notification earlier, so that if the failure
happened during switchover, it'll notify before VM restart.
After walking over all existing FAILED notifier users, I got the conclusion
that this should also be a cleaner approach at least from design POV.
We have these notifier users, where the first two do not need to trap
FAILED:
|----------------------------+-------------------------------------+---------------------|
| device | handler | events needed |
|----------------------------+-------------------------------------+---------------------|
| gicv3 | kvm_arm_gicv3_notifier | DONE |
| vfio_iommufd / vfio_legacy | vfio_cpr_reboot_notifier | SETUP |
| cpr-exec | cpr_exec_notifier | FAILED, DONE |
| virtio-net | virtio_net_migration_state_notifier | SETUP, FAILED |
| vfio | vfio_migration_state_notifier | FAILED |
| vdpa | vdpa_net_migration_state_notifier | SETUP, FAILED |
| spice [*] | migration_state_notifier | SETUP, FAILED, DONE |
|----------------------------+-------------------------------------+---------------------|
For cpr-exec, it tries to cleanup some cpr-exec specific fd or env
variables. This should be fine either way, as long as before
migration_cleanup().
For virtio-net, we need to re-plug the primary device back to guest in the
failover mode. Likely benign.
VFIO needs to re-start the device if FAILED. IIUC it should do it before
vm_start(), if the VFIO device can be put into a STOPed state due to
migration, we should logically make it running again before vCPUs run.
VDPA will disable SVQ when migration is FAILED. Likely benign too, but
looks better if we can do it before resuming vCPUs.
For spice, we should rely on "spice_server_migrate_end(false)" to retake
the ownership. Benign, but looks more reasonable if the spice client does
it before VM runs again.
Note that this change may introduce slightly more downtime, if the
migration failed exactly at the switchover phase. But that's very rare,
and even if it happens, none of above expects a long delay, but a short
one, likely will be buried in the total downtime even if failed.
Cc: Cédric Le Goater <clg@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 91775f8472..1d9a2fc068 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1481,7 +1481,6 @@ static void migration_cleanup_json_writer(MigrationState *s)
static void migration_cleanup(MigrationState *s)
{
- MigrationEventType type;
QEMUFile *tmp = NULL;
trace_migration_cleanup();
@@ -1535,9 +1534,15 @@ static void migration_cleanup(MigrationState *s)
/* It is used on info migrate. We can't free it */
error_report_err(error_copy(s->error));
}
- type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
- MIG_EVENT_PRECOPY_DONE;
- migration_call_notifiers(s, type, NULL);
+
+ /*
+ * FAILED notification should have already happened. Notify DONE if
+ * migration completed successfully.
+ */
+ if (!migration_has_failed(s)) {
+ migration_call_notifiers(s, MIG_EVENT_PRECOPY_DONE, NULL);
+ }
+
yank_unregister_instance(MIGRATION_YANK_INSTANCE);
}
@@ -3589,6 +3594,13 @@ static void migration_iteration_finish(MigrationState *s)
error_free(local_err);
break;
}
+
+ /*
+ * Notify FAILED before starting VM, so that devices can invoke
+ * necessary fallbacks before vCPUs run again.
+ */
+ migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);
+
if (runstate_is_live(s->vm_old_state)) {
if (!runstate_check(RUN_STATE_SHUTDOWN)) {
vm_start();
--
2.50.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/5] migration: Drop explicit block activation in postcopy fail path
2026-01-22 23:03 [PATCH 0/5] migration: Notifier fixes for 11.0 Peter Xu
` (2 preceding siblings ...)
2026-01-22 23:03 ` [PATCH 3/5] migration: Notify migration FAILED before starting VM Peter Xu
@ 2026-01-22 23:03 ` Peter Xu
2026-01-23 12:59 ` Fabiano Rosas
2026-01-22 23:03 ` [PATCH 5/5] migration: Rename MIG_EVENT_PRECOPY_* to MIG_EVENT_* Peter Xu
2026-01-26 15:58 ` [PATCH 0/5] migration: Notifier fixes for 11.0 Stefan Hajnoczi
5 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2026-01-22 23:03 UTC (permalink / raw)
To: qemu-devel
Cc: Juraj Marcin, Stefan Hajnoczi, Fabiano Rosas, Prasad Pandit,
peterx
Postcopy (in failure path) should share with precopy on disk reactivations.
Explicit activiation should used to be fine even if called twice, but after
26f65c01ed ("migration: Do not try to start VM if disk activation fails")
we may want to avoid it and always capture failure when reactivation
happens (even if we do not expect the failure to happen). Remove this
redundant call.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/migration/migration.c b/migration/migration.c
index 1d9a2fc068..5bef14ea99 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2910,7 +2910,6 @@ fail:
if (ms->state != MIGRATION_STATUS_CANCELLING) {
migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
}
- migration_block_activate(NULL);
bql_unlock();
return -1;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 5/5] migration: Rename MIG_EVENT_PRECOPY_* to MIG_EVENT_*
2026-01-22 23:03 [PATCH 0/5] migration: Notifier fixes for 11.0 Peter Xu
` (3 preceding siblings ...)
2026-01-22 23:03 ` [PATCH 4/5] migration: Drop explicit block activation in postcopy fail path Peter Xu
@ 2026-01-22 23:03 ` Peter Xu
2026-01-23 13:02 ` Fabiano Rosas
2026-01-26 15:58 ` [PATCH 0/5] migration: Notifier fixes for 11.0 Stefan Hajnoczi
5 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2026-01-22 23:03 UTC (permalink / raw)
To: qemu-devel
Cc: Juraj Marcin, Stefan Hajnoczi, Fabiano Rosas, Prasad Pandit,
peterx
All three events are shared between precopy and postcopy, rather than
precopy specific.
For example, both precopy and postcopy will go through a SETUP process.
Meanwhile, both FAILED and DONE notifiers will be notified for either
precopy or postcopy on completions / failures.
Rename them to make them match what they do, and shorter.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/migration/misc.h | 14 +++++++-------
hw/intc/arm_gicv3_kvm.c | 2 +-
hw/net/virtio-net.c | 4 ++--
hw/vfio/cpr-legacy.c | 2 +-
hw/vfio/cpr.c | 8 ++++----
hw/vfio/migration.c | 4 ++--
migration/cpr-exec.c | 6 +++---
migration/migration.c | 8 ++++----
net/vhost-vdpa.c | 4 ++--
ui/spice-core.c | 6 +++---
10 files changed, 29 insertions(+), 29 deletions(-)
diff --git a/include/migration/misc.h b/include/migration/misc.h
index b002466e10..766de998cb 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,10 +60,10 @@ bool migration_is_running(void);
bool migration_thread_is_self(void);
typedef enum MigrationEventType {
- MIG_EVENT_PRECOPY_SETUP,
- MIG_EVENT_PRECOPY_DONE,
- MIG_EVENT_PRECOPY_FAILED,
+ MIG_EVENT_SETUP,
MIG_EVENT_POSTCOPY_START,
+ MIG_EVENT_DONE,
+ MIG_EVENT_FAILED,
MIG_EVENT_MAX
} MigrationEventType;
@@ -73,7 +73,7 @@ typedef struct MigrationEvent {
/*
* A MigrationNotifyFunc may return an error code and an Error object,
- * but only when @e->type is MIG_EVENT_PRECOPY_SETUP. The code is an int
+ * but only when @e->type is MIG_EVENT_SETUP. The code is an int
* to allow for different failure modes and recovery actions.
*/
typedef int (*MigrationNotifyFunc)(NotifierWithReturn *notify,
@@ -83,9 +83,9 @@ typedef int (*MigrationNotifyFunc)(NotifierWithReturn *notify,
* Register the notifier @notify to be called when a migration event occurs
* for MIG_MODE_NORMAL, as specified by the MigrationEvent passed to @func.
* Notifiers may receive events in any of the following orders:
- * - MIG_EVENT_PRECOPY_SETUP -> MIG_EVENT_PRECOPY_DONE
- * - MIG_EVENT_PRECOPY_SETUP -> MIG_EVENT_PRECOPY_FAILED
- * - MIG_EVENT_PRECOPY_FAILED
+ * - MIG_EVENT_SETUP -> MIG_EVENT_DONE
+ * - MIG_EVENT_SETUP -> MIG_EVENT_FAILED
+ * - MIG_EVENT_FAILED
*/
void migration_add_notifier(NotifierWithReturn *notify,
MigrationNotifyFunc func);
diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index 6f311e37ef..fddeefa26f 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -774,7 +774,7 @@ static void vm_change_state_handler(void *opaque, bool running,
static int kvm_arm_gicv3_notifier(NotifierWithReturn *notifier,
MigrationEvent *e, Error **errp)
{
- if (e->type == MIG_EVENT_PRECOPY_DONE) {
+ if (e->type == MIG_EVENT_DONE) {
GICv3State *s = container_of(notifier, GICv3State, cpr_notifier);
return kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES,
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 317f1ad23b..3e2dc30da6 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3786,7 +3786,7 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationEvent *e)
should_be_hidden = qatomic_read(&n->failover_primary_hidden);
- if (e->type == MIG_EVENT_PRECOPY_SETUP && !should_be_hidden) {
+ if (e->type == MIG_EVENT_SETUP && !should_be_hidden) {
if (failover_unplug_primary(n, dev)) {
vmstate_unregister(VMSTATE_IF(dev), qdev_get_vmsd(dev), dev);
qapi_event_send_unplug_primary(dev->id);
@@ -3794,7 +3794,7 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationEvent *e)
} else {
warn_report("couldn't unplug primary device");
}
- } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
+ } else if (e->type == MIG_EVENT_FAILED) {
/* We already unplugged the device let's plug it back */
if (!failover_replug_primary(n, dev, &err)) {
if (err) {
diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
index 7c03ddb961..033a546c30 100644
--- a/hw/vfio/cpr-legacy.c
+++ b/hw/vfio/cpr-legacy.c
@@ -137,7 +137,7 @@ static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier,
container_of(notifier, VFIOLegacyContainer, cpr.transfer_notifier);
VFIOContainer *bcontainer = VFIO_IOMMU(container);
- if (e->type != MIG_EVENT_PRECOPY_FAILED) {
+ if (e->type != MIG_EVENT_FAILED) {
return 0;
}
diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
index 998230d271..ffa4f8e099 100644
--- a/hw/vfio/cpr.c
+++ b/hw/vfio/cpr.c
@@ -18,7 +18,7 @@
int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
MigrationEvent *e, Error **errp)
{
- if (e->type == MIG_EVENT_PRECOPY_SETUP &&
+ if (e->type == MIG_EVENT_SETUP &&
!runstate_check(RUN_STATE_SUSPENDED) && !vm_get_suspended()) {
error_setg(errp,
@@ -186,7 +186,7 @@ static int vfio_cpr_kvm_close_notifier(NotifierWithReturn *notifier,
MigrationEvent *e,
Error **errp)
{
- if (e->type == MIG_EVENT_PRECOPY_DONE) {
+ if (e->type == MIG_EVENT_DONE) {
vfio_kvm_device_close();
}
return 0;
@@ -272,9 +272,9 @@ static int vfio_cpr_pci_notifier(NotifierWithReturn *notifier,
VFIOPCIDevice *vdev =
container_of(notifier, VFIOPCIDevice, cpr.transfer_notifier);
- if (e->type == MIG_EVENT_PRECOPY_SETUP) {
+ if (e->type == MIG_EVENT_SETUP) {
return vfio_cpr_set_msi_virq(vdev, errp, false);
- } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
+ } else if (e->type == MIG_EVENT_FAILED) {
return vfio_cpr_set_msi_virq(vdev, errp, true);
}
return 0;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index f857dc25ed..76a902b79c 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -917,10 +917,10 @@ static int vfio_migration_state_notifier(NotifierWithReturn *notifier,
trace_vfio_migration_state_notifier(vbasedev->name, e->type);
- if (e->type == MIG_EVENT_PRECOPY_FAILED) {
+ if (e->type == MIG_EVENT_FAILED) {
/*
* MigrationNotifyFunc may not return an error code and an Error
- * object for MIG_EVENT_PRECOPY_FAILED. Hence, report the error
+ * object for MIG_EVENT_FAILED. Hence, report the error
* locally and ignore the errp argument.
*/
ret = vfio_migration_set_state_or_reset(vbasedev,
diff --git a/migration/cpr-exec.c b/migration/cpr-exec.c
index da287d8031..18a71828c3 100644
--- a/migration/cpr-exec.c
+++ b/migration/cpr-exec.c
@@ -164,7 +164,7 @@ static void cpr_exec_cb(void *opaque)
err = NULL;
/* Note, we can go from state COMPLETED to FAILED */
- migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);
+ migration_call_notifiers(s, MIG_EVENT_FAILED, NULL);
if (!migration_block_activate(&err)) {
/* error was already reported */
@@ -182,12 +182,12 @@ static int cpr_exec_notifier(NotifierWithReturn *notifier, MigrationEvent *e,
{
MigrationState *s = migrate_get_current();
- if (e->type == MIG_EVENT_PRECOPY_DONE) {
+ if (e->type == MIG_EVENT_DONE) {
QEMUBH *cpr_exec_bh = qemu_bh_new(cpr_exec_cb, NULL);
assert(s->state == MIGRATION_STATUS_COMPLETED);
qemu_bh_schedule(cpr_exec_bh);
qemu_notify_event();
- } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
+ } else if (e->type == MIG_EVENT_FAILED) {
cpr_exec_unpersist_state();
}
return 0;
diff --git a/migration/migration.c b/migration/migration.c
index 5bef14ea99..7ba37afb27 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1540,7 +1540,7 @@ static void migration_cleanup(MigrationState *s)
* migration completed successfully.
*/
if (!migration_has_failed(s)) {
- migration_call_notifiers(s, MIG_EVENT_PRECOPY_DONE, NULL);
+ migration_call_notifiers(s, MIG_EVENT_DONE, NULL);
}
yank_unregister_instance(MIGRATION_YANK_INSTANCE);
@@ -1720,7 +1720,7 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type,
notifier = (NotifierWithReturn *)elem->data;
ret = notifier->notify(notifier, &e, errp);
if (ret) {
- assert(type == MIG_EVENT_PRECOPY_SETUP);
+ assert(type == MIG_EVENT_SETUP);
return ret;
}
}
@@ -3598,7 +3598,7 @@ static void migration_iteration_finish(MigrationState *s)
* Notify FAILED before starting VM, so that devices can invoke
* necessary fallbacks before vCPUs run again.
*/
- migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);
+ migration_call_notifiers(s, MIG_EVENT_FAILED, NULL);
if (runstate_is_live(s->vm_old_state)) {
if (!runstate_check(RUN_STATE_SHUTDOWN)) {
@@ -4064,7 +4064,7 @@ void migration_connect(MigrationState *s, Error *error_in)
rate_limit = migrate_max_bandwidth();
/* Notify before starting migration thread */
- if (migration_call_notifiers(s, MIG_EVENT_PRECOPY_SETUP, &local_err)) {
+ if (migration_call_notifiers(s, MIG_EVENT_SETUP, &local_err)) {
goto fail;
}
}
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 74d26a9497..f4b1f0e9e0 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -378,9 +378,9 @@ static int vdpa_net_migration_state_notifier(NotifierWithReturn *notifier,
{
VhostVDPAState *s = container_of(notifier, VhostVDPAState, migration_state);
- if (e->type == MIG_EVENT_PRECOPY_SETUP) {
+ if (e->type == MIG_EVENT_SETUP) {
vhost_vdpa_net_log_global_enable(s, true);
- } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
+ } else if (e->type == MIG_EVENT_FAILED) {
vhost_vdpa_net_log_global_enable(s, false);
}
return 0;
diff --git a/ui/spice-core.c b/ui/spice-core.c
index ce3c2954e3..ee13ecc4a5 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -583,13 +583,13 @@ static int migration_state_notifier(NotifierWithReturn *notifier,
return 0;
}
- if (e->type == MIG_EVENT_PRECOPY_SETUP) {
+ if (e->type == MIG_EVENT_SETUP) {
spice_server_migrate_start(spice_server);
- } else if (e->type == MIG_EVENT_PRECOPY_DONE ||
+ } else if (e->type == MIG_EVENT_DONE ||
e->type == MIG_EVENT_POSTCOPY_START) {
spice_server_migrate_end(spice_server, true);
spice_have_target_host = false;
- } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
+ } else if (e->type == MIG_EVENT_FAILED) {
spice_server_migrate_end(spice_server, false);
spice_have_target_host = false;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/5] migration: Add a tracepoint for invoking migration notifiers
2026-01-22 23:03 ` [PATCH 1/5] migration: Add a tracepoint for invoking migration notifiers Peter Xu
@ 2026-01-23 12:25 ` Fabiano Rosas
0 siblings, 0 replies; 18+ messages in thread
From: Fabiano Rosas @ 2026-01-23 12:25 UTC (permalink / raw)
To: Peter Xu, qemu-devel; +Cc: Juraj Marcin, Stefan Hajnoczi, Prasad Pandit, peterx
Peter Xu <peterx@redhat.com> writes:
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> migration/migration.c | 2 ++
> migration/trace-events | 1 +
> 2 files changed, 3 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 1bcde301f7..47d9189aaf 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1706,6 +1706,8 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type,
> GSList *elem, *next;
> int ret;
>
> + trace_migration_call_notifiers(type);
> +
> e.type = type;
>
> for (elem = migration_state_notifiers[mode]; elem; elem = next) {
> diff --git a/migration/trace-events b/migration/trace-events
> index bf11b62b17..f437f0a784 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -198,6 +198,7 @@ process_incoming_migration_co_end(int ret) "ret=%d"
> process_incoming_migration_co_postcopy_end_main(void) ""
> postcopy_preempt_enabled(bool value) "%d"
> migration_precopy_complete(void) ""
> +migration_call_notifiers(int type) "type=%d"
>
> # migration-stats
> migration_transferred_bytes(uint64_t qemu_file, uint64_t multifd, uint64_t rdma) "qemu_file %" PRIu64 " multifd %" PRIu64 " RDMA %" PRIu64
Reviewed-by: Fabiano Rosas <farosas@suse.de>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/5] migration: Fix double notification of DONE/FAIL for postcopy
2026-01-22 23:03 ` [PATCH 2/5] migration: Fix double notification of DONE/FAIL for postcopy Peter Xu
@ 2026-01-23 12:52 ` Fabiano Rosas
2026-01-23 12:54 ` Fabiano Rosas
2026-01-23 14:58 ` Peter Xu
0 siblings, 2 replies; 18+ messages in thread
From: Fabiano Rosas @ 2026-01-23 12:52 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Juraj Marcin, Stefan Hajnoczi, Prasad Pandit, peterx,
Marc-André Lureau, Dr. David Alan Gilbert
Peter Xu <peterx@redhat.com> writes:
> Migration notifiers will notify at any of three places: (1) SETUP
> phase, (2) migration completes, (3) migration fails.
>
> There's actually a special case for spice: one can refer to
> b82fc321bf ("Postcopy+spice: Pass spice migration data earlier"). It
> doesn't need another 4th event because in commit 9d9babf78d ("migration:
> MigrationEvent for notifiers") we merged it together with the DONE event.
>
> The merge makes some sense if we treat "switchover" of postcopy as "DONE",
> however that also means for postcopy we'll notify DONE twice.. The other
> one at the end of postcopy when migration_cleanup().
>
> In reality, the current code base will also notify FAILED for postcopy
> twice. It's because an (maybe accidental) change in commit
> 4af667f87c ("migration: notifier error checking").
>
> First of all, we still need that notification when switchover as stated in
> Dave's commit, however that's only needed for spice. To fix it, introduce
> POSTCOPY_START event to differenciate it from DONE. Use that instead in
> postcopy_start(). Then spice will need to capture this event too.
>
> Then we remove the extra FAILED notification in postcopy_start().
>
> If one wonder if other DONE users should also monitor POSTCOPY_START
> event.. We have two more DONE users:
>
> - kvm_arm_gicv3_notifier
> - cpr_exec_notifier
>
> Both of them do not need a notification for POSTCOPY_START, but only when
> migration completed. Actually, both of them are used in CPR, which doesn't
> support postcopy.
>
> I didn't attach Fixes: because I am not aware of any real bug on such
> double reporting. I'm wildly guessing the 2nd notify might be silently
> ignored in many cases. However this is still worth fixing.
>
> Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
> Cc: Dr. David Alan Gilbert <dave@treblig.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> include/migration/misc.h | 1 +
> migration/migration.c | 3 +--
> ui/spice-core.c | 3 ++-
> 3 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index e26d418a6e..b002466e10 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -63,6 +63,7 @@ typedef enum MigrationEventType {
> MIG_EVENT_PRECOPY_SETUP,
> MIG_EVENT_PRECOPY_DONE,
> MIG_EVENT_PRECOPY_FAILED,
> + MIG_EVENT_POSTCOPY_START,
> MIG_EVENT_MAX
> } MigrationEventType;
With the new state there's a doc text to update at
migration_add_notifier below.
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 47d9189aaf..91775f8472 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2857,7 +2857,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
> * at the transition to postcopy and after the device state; in particular
> * spice needs to trigger a transition now
> */
> - migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE, NULL);
> + migration_call_notifiers(ms, MIG_EVENT_POSTCOPY_START, NULL);
>
> migration_downtime_end(ms);
>
> @@ -2906,7 +2906,6 @@ fail:
> migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
> }
> migration_block_activate(NULL);
> - migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
Does anything consuming the FAILED event expects it to be issued before
the vm_start() at migration_iteration_finish?
migration_iteration_finish:
bql_lock();
switch (s->state) {
case MIGRATION_STATUS_FAILED:
if (!migration_block_activate(&local_err)) {
error_free(local_err);
break;
}
if (runstate_is_live(s->vm_old_state)) {
if (!runstate_check(RUN_STATE_SHUTDOWN)) {
vm_start();
}
} else {
if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
runstate_set(s->vm_old_state);
}
}
The extra migration_block_activate() in postcopy_start() also seems
wrong, no?
> bql_unlock();
> return -1;
> }
> diff --git a/ui/spice-core.c b/ui/spice-core.c
> index 8a6050f4ae..ce3c2954e3 100644
> --- a/ui/spice-core.c
> +++ b/ui/spice-core.c
> @@ -585,7 +585,8 @@ static int migration_state_notifier(NotifierWithReturn *notifier,
>
> if (e->type == MIG_EVENT_PRECOPY_SETUP) {
> spice_server_migrate_start(spice_server);
> - } else if (e->type == MIG_EVENT_PRECOPY_DONE) {
> + } else if (e->type == MIG_EVENT_PRECOPY_DONE ||
> + e->type == MIG_EVENT_POSTCOPY_START) {
> spice_server_migrate_end(spice_server, true);
> spice_have_target_host = false;
> } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/5] migration: Fix double notification of DONE/FAIL for postcopy
2026-01-23 12:52 ` Fabiano Rosas
@ 2026-01-23 12:54 ` Fabiano Rosas
2026-01-23 14:58 ` Peter Xu
1 sibling, 0 replies; 18+ messages in thread
From: Fabiano Rosas @ 2026-01-23 12:54 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Juraj Marcin, Stefan Hajnoczi, Prasad Pandit, peterx,
Marc-André Lureau, Dr. David Alan Gilbert
Fabiano Rosas <farosas@suse.de> writes:
> Peter Xu <peterx@redhat.com> writes:
>
>> Migration notifiers will notify at any of three places: (1) SETUP
>> phase, (2) migration completes, (3) migration fails.
>>
>> There's actually a special case for spice: one can refer to
>> b82fc321bf ("Postcopy+spice: Pass spice migration data earlier"). It
>> doesn't need another 4th event because in commit 9d9babf78d ("migration:
>> MigrationEvent for notifiers") we merged it together with the DONE event.
>>
>> The merge makes some sense if we treat "switchover" of postcopy as "DONE",
>> however that also means for postcopy we'll notify DONE twice.. The other
>> one at the end of postcopy when migration_cleanup().
>>
>> In reality, the current code base will also notify FAILED for postcopy
>> twice. It's because an (maybe accidental) change in commit
>> 4af667f87c ("migration: notifier error checking").
>>
>> First of all, we still need that notification when switchover as stated in
>> Dave's commit, however that's only needed for spice. To fix it, introduce
>> POSTCOPY_START event to differenciate it from DONE. Use that instead in
>> postcopy_start(). Then spice will need to capture this event too.
>>
>> Then we remove the extra FAILED notification in postcopy_start().
>>
>> If one wonder if other DONE users should also monitor POSTCOPY_START
>> event.. We have two more DONE users:
>>
>> - kvm_arm_gicv3_notifier
>> - cpr_exec_notifier
>>
>> Both of them do not need a notification for POSTCOPY_START, but only when
>> migration completed. Actually, both of them are used in CPR, which doesn't
>> support postcopy.
>>
>> I didn't attach Fixes: because I am not aware of any real bug on such
>> double reporting. I'm wildly guessing the 2nd notify might be silently
>> ignored in many cases. However this is still worth fixing.
>>
>> Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
>> Cc: Dr. David Alan Gilbert <dave@treblig.org>
>> Signed-off-by: Peter Xu <peterx@redhat.com>
>> ---
>> include/migration/misc.h | 1 +
>> migration/migration.c | 3 +--
>> ui/spice-core.c | 3 ++-
>> 3 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>> index e26d418a6e..b002466e10 100644
>> --- a/include/migration/misc.h
>> +++ b/include/migration/misc.h
>> @@ -63,6 +63,7 @@ typedef enum MigrationEventType {
>> MIG_EVENT_PRECOPY_SETUP,
>> MIG_EVENT_PRECOPY_DONE,
>> MIG_EVENT_PRECOPY_FAILED,
>> + MIG_EVENT_POSTCOPY_START,
>> MIG_EVENT_MAX
>> } MigrationEventType;
>
> With the new state there's a doc text to update at
> migration_add_notifier below.
>
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 47d9189aaf..91775f8472 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2857,7 +2857,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>> * at the transition to postcopy and after the device state; in particular
>> * spice needs to trigger a transition now
>> */
>> - migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE, NULL);
>> + migration_call_notifiers(ms, MIG_EVENT_POSTCOPY_START, NULL);
>>
>> migration_downtime_end(ms);
>>
>> @@ -2906,7 +2906,6 @@ fail:
>> migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
>> }
>> migration_block_activate(NULL);
>> - migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
>
> Does anything consuming the FAILED event expects it to be issued before
> the vm_start() at migration_iteration_finish?
>
> migration_iteration_finish:
>
> bql_lock();
> switch (s->state) {
> case MIGRATION_STATUS_FAILED:
> if (!migration_block_activate(&local_err)) {
> error_free(local_err);
> break;
> }
> if (runstate_is_live(s->vm_old_state)) {
> if (!runstate_check(RUN_STATE_SHUTDOWN)) {
> vm_start();
> }
> } else {
> if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
> runstate_set(s->vm_old_state);
> }
> }
>
> The extra migration_block_activate() in postcopy_start() also seems
> wrong, no?
>
Heh, nevermind...
>> bql_unlock();
>> return -1;
>> }
>> diff --git a/ui/spice-core.c b/ui/spice-core.c
>> index 8a6050f4ae..ce3c2954e3 100644
>> --- a/ui/spice-core.c
>> +++ b/ui/spice-core.c
>> @@ -585,7 +585,8 @@ static int migration_state_notifier(NotifierWithReturn *notifier,
>>
>> if (e->type == MIG_EVENT_PRECOPY_SETUP) {
>> spice_server_migrate_start(spice_server);
>> - } else if (e->type == MIG_EVENT_PRECOPY_DONE) {
>> + } else if (e->type == MIG_EVENT_PRECOPY_DONE ||
>> + e->type == MIG_EVENT_POSTCOPY_START) {
>> spice_server_migrate_end(spice_server, true);
>> spice_have_target_host = false;
>> } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/5] migration: Notify migration FAILED before starting VM
2026-01-22 23:03 ` [PATCH 3/5] migration: Notify migration FAILED before starting VM Peter Xu
@ 2026-01-23 12:59 ` Fabiano Rosas
2026-01-23 15:40 ` Peter Xu
0 siblings, 1 reply; 18+ messages in thread
From: Fabiano Rosas @ 2026-01-23 12:59 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Juraj Marcin, Stefan Hajnoczi, Prasad Pandit, peterx,
Cédric Le Goater, Marc-André Lureau
Peter Xu <peterx@redhat.com> writes:
> Devices may opt-in migration FAILED notifiers to be invoked when migration
> fails. Currently, the notifications happen in migration_cleanup(). It is
> normally fine, but maybe not ideal if there's dependency of the fallback
> v.s. VM starts.
>
> This patch moves the FAILED notification earlier, so that if the failure
> happened during switchover, it'll notify before VM restart.
>
The change to FAILED in patch 2 should come to this patch to avoid
having a window where the notification only happens at the end.
> After walking over all existing FAILED notifier users, I got the conclusion
> that this should also be a cleaner approach at least from design POV.
>
> We have these notifier users, where the first two do not need to trap
> FAILED:
>
> |----------------------------+-------------------------------------+---------------------|
> | device | handler | events needed |
> |----------------------------+-------------------------------------+---------------------|
> | gicv3 | kvm_arm_gicv3_notifier | DONE |
> | vfio_iommufd / vfio_legacy | vfio_cpr_reboot_notifier | SETUP |
> | cpr-exec | cpr_exec_notifier | FAILED, DONE |
> | virtio-net | virtio_net_migration_state_notifier | SETUP, FAILED |
> | vfio | vfio_migration_state_notifier | FAILED |
> | vdpa | vdpa_net_migration_state_notifier | SETUP, FAILED |
> | spice [*] | migration_state_notifier | SETUP, FAILED, DONE |
> |----------------------------+-------------------------------------+---------------------|
>
> For cpr-exec, it tries to cleanup some cpr-exec specific fd or env
> variables. This should be fine either way, as long as before
> migration_cleanup().
>
> For virtio-net, we need to re-plug the primary device back to guest in the
> failover mode. Likely benign.
>
> VFIO needs to re-start the device if FAILED. IIUC it should do it before
> vm_start(), if the VFIO device can be put into a STOPed state due to
> migration, we should logically make it running again before vCPUs run.
>
> VDPA will disable SVQ when migration is FAILED. Likely benign too, but
> looks better if we can do it before resuming vCPUs.
>
> For spice, we should rely on "spice_server_migrate_end(false)" to retake
> the ownership. Benign, but looks more reasonable if the spice client does
> it before VM runs again.
>
> Note that this change may introduce slightly more downtime, if the
> migration failed exactly at the switchover phase. But that's very rare,
> and even if it happens, none of above expects a long delay, but a short
> one, likely will be buried in the total downtime even if failed.
>
> Cc: Cédric Le Goater <clg@redhat.com>
> Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> migration/migration.c | 20 ++++++++++++++++----
> 1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 91775f8472..1d9a2fc068 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1481,7 +1481,6 @@ static void migration_cleanup_json_writer(MigrationState *s)
>
> static void migration_cleanup(MigrationState *s)
> {
> - MigrationEventType type;
> QEMUFile *tmp = NULL;
>
> trace_migration_cleanup();
> @@ -1535,9 +1534,15 @@ static void migration_cleanup(MigrationState *s)
> /* It is used on info migrate. We can't free it */
> error_report_err(error_copy(s->error));
> }
> - type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
> - MIG_EVENT_PRECOPY_DONE;
> - migration_call_notifiers(s, type, NULL);
> +
> + /*
> + * FAILED notification should have already happened. Notify DONE if
> + * migration completed successfully.
> + */
> + if (!migration_has_failed(s)) {
> + migration_call_notifiers(s, MIG_EVENT_PRECOPY_DONE, NULL);
> + }
> +
> yank_unregister_instance(MIGRATION_YANK_INSTANCE);
> }
>
> @@ -3589,6 +3594,13 @@ static void migration_iteration_finish(MigrationState *s)
> error_free(local_err);
> break;
> }
> +
> + /*
> + * Notify FAILED before starting VM, so that devices can invoke
> + * necessary fallbacks before vCPUs run again.
> + */
> + migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);
> +
> if (runstate_is_live(s->vm_old_state)) {
> if (!runstate_check(RUN_STATE_SHUTDOWN)) {
> vm_start();
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/5] migration: Drop explicit block activation in postcopy fail path
2026-01-22 23:03 ` [PATCH 4/5] migration: Drop explicit block activation in postcopy fail path Peter Xu
@ 2026-01-23 12:59 ` Fabiano Rosas
0 siblings, 0 replies; 18+ messages in thread
From: Fabiano Rosas @ 2026-01-23 12:59 UTC (permalink / raw)
To: Peter Xu, qemu-devel; +Cc: Juraj Marcin, Stefan Hajnoczi, Prasad Pandit, peterx
Peter Xu <peterx@redhat.com> writes:
> Postcopy (in failure path) should share with precopy on disk reactivations.
> Explicit activiation should used to be fine even if called twice, but after
> 26f65c01ed ("migration: Do not try to start VM if disk activation fails")
> we may want to avoid it and always capture failure when reactivation
> happens (even if we do not expect the failure to happen). Remove this
> redundant call.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> migration/migration.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 1d9a2fc068..5bef14ea99 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2910,7 +2910,6 @@ fail:
> if (ms->state != MIGRATION_STATUS_CANCELLING) {
> migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
> }
> - migration_block_activate(NULL);
> bql_unlock();
> return -1;
> }
Reviewed-by: Fabiano Rosas <farosas@suse.de>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 5/5] migration: Rename MIG_EVENT_PRECOPY_* to MIG_EVENT_*
2026-01-22 23:03 ` [PATCH 5/5] migration: Rename MIG_EVENT_PRECOPY_* to MIG_EVENT_* Peter Xu
@ 2026-01-23 13:02 ` Fabiano Rosas
0 siblings, 0 replies; 18+ messages in thread
From: Fabiano Rosas @ 2026-01-23 13:02 UTC (permalink / raw)
To: Peter Xu, qemu-devel; +Cc: Juraj Marcin, Stefan Hajnoczi, Prasad Pandit, peterx
Peter Xu <peterx@redhat.com> writes:
> All three events are shared between precopy and postcopy, rather than
> precopy specific.
>
> For example, both precopy and postcopy will go through a SETUP process.
>
> Meanwhile, both FAILED and DONE notifiers will be notified for either
> precopy or postcopy on completions / failures.
>
> Rename them to make them match what they do, and shorter.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> include/migration/misc.h | 14 +++++++-------
> hw/intc/arm_gicv3_kvm.c | 2 +-
> hw/net/virtio-net.c | 4 ++--
> hw/vfio/cpr-legacy.c | 2 +-
> hw/vfio/cpr.c | 8 ++++----
> hw/vfio/migration.c | 4 ++--
> migration/cpr-exec.c | 6 +++---
> migration/migration.c | 8 ++++----
> net/vhost-vdpa.c | 4 ++--
> ui/spice-core.c | 6 +++---
> 10 files changed, 29 insertions(+), 29 deletions(-)
>
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index b002466e10..766de998cb 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -60,10 +60,10 @@ bool migration_is_running(void);
> bool migration_thread_is_self(void);
>
> typedef enum MigrationEventType {
> - MIG_EVENT_PRECOPY_SETUP,
> - MIG_EVENT_PRECOPY_DONE,
> - MIG_EVENT_PRECOPY_FAILED,
> + MIG_EVENT_SETUP,
> MIG_EVENT_POSTCOPY_START,
> + MIG_EVENT_DONE,
> + MIG_EVENT_FAILED,
> MIG_EVENT_MAX
> } MigrationEventType;
>
> @@ -73,7 +73,7 @@ typedef struct MigrationEvent {
>
> /*
> * A MigrationNotifyFunc may return an error code and an Error object,
> - * but only when @e->type is MIG_EVENT_PRECOPY_SETUP. The code is an int
> + * but only when @e->type is MIG_EVENT_SETUP. The code is an int
> * to allow for different failure modes and recovery actions.
> */
> typedef int (*MigrationNotifyFunc)(NotifierWithReturn *notify,
> @@ -83,9 +83,9 @@ typedef int (*MigrationNotifyFunc)(NotifierWithReturn *notify,
> * Register the notifier @notify to be called when a migration event occurs
> * for MIG_MODE_NORMAL, as specified by the MigrationEvent passed to @func.
> * Notifiers may receive events in any of the following orders:
> - * - MIG_EVENT_PRECOPY_SETUP -> MIG_EVENT_PRECOPY_DONE
> - * - MIG_EVENT_PRECOPY_SETUP -> MIG_EVENT_PRECOPY_FAILED
> - * - MIG_EVENT_PRECOPY_FAILED
> + * - MIG_EVENT_SETUP -> MIG_EVENT_DONE
> + * - MIG_EVENT_SETUP -> MIG_EVENT_FAILED
> + * - MIG_EVENT_FAILED
> */
> void migration_add_notifier(NotifierWithReturn *notify,
> MigrationNotifyFunc func);
> diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
> index 6f311e37ef..fddeefa26f 100644
> --- a/hw/intc/arm_gicv3_kvm.c
> +++ b/hw/intc/arm_gicv3_kvm.c
> @@ -774,7 +774,7 @@ static void vm_change_state_handler(void *opaque, bool running,
> static int kvm_arm_gicv3_notifier(NotifierWithReturn *notifier,
> MigrationEvent *e, Error **errp)
> {
> - if (e->type == MIG_EVENT_PRECOPY_DONE) {
> + if (e->type == MIG_EVENT_DONE) {
> GICv3State *s = container_of(notifier, GICv3State, cpr_notifier);
> return kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
> KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES,
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 317f1ad23b..3e2dc30da6 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -3786,7 +3786,7 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationEvent *e)
>
> should_be_hidden = qatomic_read(&n->failover_primary_hidden);
>
> - if (e->type == MIG_EVENT_PRECOPY_SETUP && !should_be_hidden) {
> + if (e->type == MIG_EVENT_SETUP && !should_be_hidden) {
> if (failover_unplug_primary(n, dev)) {
> vmstate_unregister(VMSTATE_IF(dev), qdev_get_vmsd(dev), dev);
> qapi_event_send_unplug_primary(dev->id);
> @@ -3794,7 +3794,7 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationEvent *e)
> } else {
> warn_report("couldn't unplug primary device");
> }
> - } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
> + } else if (e->type == MIG_EVENT_FAILED) {
> /* We already unplugged the device let's plug it back */
> if (!failover_replug_primary(n, dev, &err)) {
> if (err) {
> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
> index 7c03ddb961..033a546c30 100644
> --- a/hw/vfio/cpr-legacy.c
> +++ b/hw/vfio/cpr-legacy.c
> @@ -137,7 +137,7 @@ static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier,
> container_of(notifier, VFIOLegacyContainer, cpr.transfer_notifier);
> VFIOContainer *bcontainer = VFIO_IOMMU(container);
>
> - if (e->type != MIG_EVENT_PRECOPY_FAILED) {
> + if (e->type != MIG_EVENT_FAILED) {
> return 0;
> }
>
> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
> index 998230d271..ffa4f8e099 100644
> --- a/hw/vfio/cpr.c
> +++ b/hw/vfio/cpr.c
> @@ -18,7 +18,7 @@
> int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
> MigrationEvent *e, Error **errp)
> {
> - if (e->type == MIG_EVENT_PRECOPY_SETUP &&
> + if (e->type == MIG_EVENT_SETUP &&
> !runstate_check(RUN_STATE_SUSPENDED) && !vm_get_suspended()) {
>
> error_setg(errp,
> @@ -186,7 +186,7 @@ static int vfio_cpr_kvm_close_notifier(NotifierWithReturn *notifier,
> MigrationEvent *e,
> Error **errp)
> {
> - if (e->type == MIG_EVENT_PRECOPY_DONE) {
> + if (e->type == MIG_EVENT_DONE) {
> vfio_kvm_device_close();
> }
> return 0;
> @@ -272,9 +272,9 @@ static int vfio_cpr_pci_notifier(NotifierWithReturn *notifier,
> VFIOPCIDevice *vdev =
> container_of(notifier, VFIOPCIDevice, cpr.transfer_notifier);
>
> - if (e->type == MIG_EVENT_PRECOPY_SETUP) {
> + if (e->type == MIG_EVENT_SETUP) {
> return vfio_cpr_set_msi_virq(vdev, errp, false);
> - } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
> + } else if (e->type == MIG_EVENT_FAILED) {
> return vfio_cpr_set_msi_virq(vdev, errp, true);
> }
> return 0;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index f857dc25ed..76a902b79c 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -917,10 +917,10 @@ static int vfio_migration_state_notifier(NotifierWithReturn *notifier,
>
> trace_vfio_migration_state_notifier(vbasedev->name, e->type);
>
> - if (e->type == MIG_EVENT_PRECOPY_FAILED) {
> + if (e->type == MIG_EVENT_FAILED) {
> /*
> * MigrationNotifyFunc may not return an error code and an Error
> - * object for MIG_EVENT_PRECOPY_FAILED. Hence, report the error
> + * object for MIG_EVENT_FAILED. Hence, report the error
> * locally and ignore the errp argument.
> */
> ret = vfio_migration_set_state_or_reset(vbasedev,
> diff --git a/migration/cpr-exec.c b/migration/cpr-exec.c
> index da287d8031..18a71828c3 100644
> --- a/migration/cpr-exec.c
> +++ b/migration/cpr-exec.c
> @@ -164,7 +164,7 @@ static void cpr_exec_cb(void *opaque)
> err = NULL;
>
> /* Note, we can go from state COMPLETED to FAILED */
> - migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);
> + migration_call_notifiers(s, MIG_EVENT_FAILED, NULL);
>
> if (!migration_block_activate(&err)) {
> /* error was already reported */
> @@ -182,12 +182,12 @@ static int cpr_exec_notifier(NotifierWithReturn *notifier, MigrationEvent *e,
> {
> MigrationState *s = migrate_get_current();
>
> - if (e->type == MIG_EVENT_PRECOPY_DONE) {
> + if (e->type == MIG_EVENT_DONE) {
> QEMUBH *cpr_exec_bh = qemu_bh_new(cpr_exec_cb, NULL);
> assert(s->state == MIGRATION_STATUS_COMPLETED);
> qemu_bh_schedule(cpr_exec_bh);
> qemu_notify_event();
> - } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
> + } else if (e->type == MIG_EVENT_FAILED) {
> cpr_exec_unpersist_state();
> }
> return 0;
> diff --git a/migration/migration.c b/migration/migration.c
> index 5bef14ea99..7ba37afb27 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1540,7 +1540,7 @@ static void migration_cleanup(MigrationState *s)
> * migration completed successfully.
> */
> if (!migration_has_failed(s)) {
> - migration_call_notifiers(s, MIG_EVENT_PRECOPY_DONE, NULL);
> + migration_call_notifiers(s, MIG_EVENT_DONE, NULL);
> }
>
> yank_unregister_instance(MIGRATION_YANK_INSTANCE);
> @@ -1720,7 +1720,7 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type,
> notifier = (NotifierWithReturn *)elem->data;
> ret = notifier->notify(notifier, &e, errp);
> if (ret) {
> - assert(type == MIG_EVENT_PRECOPY_SETUP);
> + assert(type == MIG_EVENT_SETUP);
> return ret;
> }
> }
> @@ -3598,7 +3598,7 @@ static void migration_iteration_finish(MigrationState *s)
> * Notify FAILED before starting VM, so that devices can invoke
> * necessary fallbacks before vCPUs run again.
> */
> - migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);
> + migration_call_notifiers(s, MIG_EVENT_FAILED, NULL);
>
> if (runstate_is_live(s->vm_old_state)) {
> if (!runstate_check(RUN_STATE_SHUTDOWN)) {
> @@ -4064,7 +4064,7 @@ void migration_connect(MigrationState *s, Error *error_in)
> rate_limit = migrate_max_bandwidth();
>
> /* Notify before starting migration thread */
> - if (migration_call_notifiers(s, MIG_EVENT_PRECOPY_SETUP, &local_err)) {
> + if (migration_call_notifiers(s, MIG_EVENT_SETUP, &local_err)) {
> goto fail;
> }
> }
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 74d26a9497..f4b1f0e9e0 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -378,9 +378,9 @@ static int vdpa_net_migration_state_notifier(NotifierWithReturn *notifier,
> {
> VhostVDPAState *s = container_of(notifier, VhostVDPAState, migration_state);
>
> - if (e->type == MIG_EVENT_PRECOPY_SETUP) {
> + if (e->type == MIG_EVENT_SETUP) {
> vhost_vdpa_net_log_global_enable(s, true);
> - } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
> + } else if (e->type == MIG_EVENT_FAILED) {
> vhost_vdpa_net_log_global_enable(s, false);
> }
> return 0;
> diff --git a/ui/spice-core.c b/ui/spice-core.c
> index ce3c2954e3..ee13ecc4a5 100644
> --- a/ui/spice-core.c
> +++ b/ui/spice-core.c
> @@ -583,13 +583,13 @@ static int migration_state_notifier(NotifierWithReturn *notifier,
> return 0;
> }
>
> - if (e->type == MIG_EVENT_PRECOPY_SETUP) {
> + if (e->type == MIG_EVENT_SETUP) {
> spice_server_migrate_start(spice_server);
> - } else if (e->type == MIG_EVENT_PRECOPY_DONE ||
> + } else if (e->type == MIG_EVENT_DONE ||
> e->type == MIG_EVENT_POSTCOPY_START) {
> spice_server_migrate_end(spice_server, true);
> spice_have_target_host = false;
> - } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
> + } else if (e->type == MIG_EVENT_FAILED) {
> spice_server_migrate_end(spice_server, false);
> spice_have_target_host = false;
> }
Reviewed-by: Fabiano Rosas <farosas@suse.de>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/5] migration: Fix double notification of DONE/FAIL for postcopy
2026-01-23 12:52 ` Fabiano Rosas
2026-01-23 12:54 ` Fabiano Rosas
@ 2026-01-23 14:58 ` Peter Xu
1 sibling, 0 replies; 18+ messages in thread
From: Peter Xu @ 2026-01-23 14:58 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Juraj Marcin, Stefan Hajnoczi, Prasad Pandit,
Marc-André Lureau, Dr. David Alan Gilbert
On Fri, Jan 23, 2026 at 09:52:31AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > Migration notifiers will notify at any of three places: (1) SETUP
> > phase, (2) migration completes, (3) migration fails.
> >
> > There's actually a special case for spice: one can refer to
> > b82fc321bf ("Postcopy+spice: Pass spice migration data earlier"). It
> > doesn't need another 4th event because in commit 9d9babf78d ("migration:
> > MigrationEvent for notifiers") we merged it together with the DONE event.
> >
> > The merge makes some sense if we treat "switchover" of postcopy as "DONE",
> > however that also means for postcopy we'll notify DONE twice.. The other
> > one at the end of postcopy when migration_cleanup().
> >
> > In reality, the current code base will also notify FAILED for postcopy
> > twice. It's because an (maybe accidental) change in commit
> > 4af667f87c ("migration: notifier error checking").
> >
> > First of all, we still need that notification when switchover as stated in
> > Dave's commit, however that's only needed for spice. To fix it, introduce
> > POSTCOPY_START event to differenciate it from DONE. Use that instead in
> > postcopy_start(). Then spice will need to capture this event too.
> >
> > Then we remove the extra FAILED notification in postcopy_start().
> >
> > If one wonder if other DONE users should also monitor POSTCOPY_START
> > event.. We have two more DONE users:
> >
> > - kvm_arm_gicv3_notifier
> > - cpr_exec_notifier
> >
> > Both of them do not need a notification for POSTCOPY_START, but only when
> > migration completed. Actually, both of them are used in CPR, which doesn't
> > support postcopy.
> >
> > I didn't attach Fixes: because I am not aware of any real bug on such
> > double reporting. I'm wildly guessing the 2nd notify might be silently
> > ignored in many cases. However this is still worth fixing.
> >
> > Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
> > Cc: Dr. David Alan Gilbert <dave@treblig.org>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > include/migration/misc.h | 1 +
> > migration/migration.c | 3 +--
> > ui/spice-core.c | 3 ++-
> > 3 files changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/migration/misc.h b/include/migration/misc.h
> > index e26d418a6e..b002466e10 100644
> > --- a/include/migration/misc.h
> > +++ b/include/migration/misc.h
> > @@ -63,6 +63,7 @@ typedef enum MigrationEventType {
> > MIG_EVENT_PRECOPY_SETUP,
> > MIG_EVENT_PRECOPY_DONE,
> > MIG_EVENT_PRECOPY_FAILED,
> > + MIG_EVENT_POSTCOPY_START,
> > MIG_EVENT_MAX
> > } MigrationEventType;
>
> With the new state there's a doc text to update at
> migration_add_notifier below.
Yeh I noticed it, I didn't do it as I found POSTCOPY_START is special in
this case marking an optional phase for postcopy-only.
I can still update it, though.. Then it'll look like:
- MIG_EVENT_SETUP [-> MIG_EVENT_POSTCOPY_START] -> MIG_EVENT_DONE
- MIG_EVENT_SETUP [-> MIG_EVENT_POSTCOPY_START] -> MIG_EVENT_FAILED
- MIG_EVENT_FAILED
I'll also move it above the enum instead if no objections.
--
Peter Xu
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/5] migration: Notify migration FAILED before starting VM
2026-01-23 12:59 ` Fabiano Rosas
@ 2026-01-23 15:40 ` Peter Xu
2026-01-23 17:36 ` Fabiano Rosas
0 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2026-01-23 15:40 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Juraj Marcin, Stefan Hajnoczi, Prasad Pandit,
Cédric Le Goater, Marc-André Lureau
On Fri, Jan 23, 2026 at 09:59:35AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > Devices may opt-in migration FAILED notifiers to be invoked when migration
> > fails. Currently, the notifications happen in migration_cleanup(). It is
> > normally fine, but maybe not ideal if there's dependency of the fallback
> > v.s. VM starts.
> >
> > This patch moves the FAILED notification earlier, so that if the failure
> > happened during switchover, it'll notify before VM restart.
> >
>
> The change to FAILED in patch 2 should come to this patch to avoid
> having a window where the notification only happens at the end.
Hmm.. Isn't that expected? Even after patch 2, we still notify FAILED at
the end for precopy. It's the same for postcopy.
For a failed postcopy we have following behavior:
Before patch 2
==============
- notify FAILED (during switchover)
- vm_start()
- notify FAILED (during migration_cleanup)
After patch 2
=============
- vm_start()
- notify FAILED (during migration_cleanup)
So patch 2 fixes the duplicate issue, and only fixes that.
After patch 3
=============
- notify FAILED (during migration_iteration_finish)
- vm_start()
Patch 3 changes the place of FAILED notification so that it happens always
before vm_start(), for both precopy and postcopy.
--
Peter Xu
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/5] migration: Notify migration FAILED before starting VM
2026-01-23 15:40 ` Peter Xu
@ 2026-01-23 17:36 ` Fabiano Rosas
2026-01-26 15:21 ` Peter Xu
0 siblings, 1 reply; 18+ messages in thread
From: Fabiano Rosas @ 2026-01-23 17:36 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Juraj Marcin, Stefan Hajnoczi, Prasad Pandit,
Cédric Le Goater, Marc-André Lureau
Peter Xu <peterx@redhat.com> writes:
> On Fri, Jan 23, 2026 at 09:59:35AM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > Devices may opt-in migration FAILED notifiers to be invoked when migration
>> > fails. Currently, the notifications happen in migration_cleanup(). It is
>> > normally fine, but maybe not ideal if there's dependency of the fallback
>> > v.s. VM starts.
>> >
>> > This patch moves the FAILED notification earlier, so that if the failure
>> > happened during switchover, it'll notify before VM restart.
>> >
>>
>> The change to FAILED in patch 2 should come to this patch to avoid
>> having a window where the notification only happens at the end.
>
> Hmm.. Isn't that expected? Even after patch 2, we still notify FAILED at
> the end for precopy. It's the same for postcopy.
>
Sorry, I meant: s/at the end/after vm_start/.
> For a failed postcopy we have following behavior:
>
> Before patch 2
> ==============
>
> - notify FAILED (during switchover)
> - vm_start()
> - notify FAILED (during migration_cleanup)
>
> After patch 2
> =============
>
> - vm_start()
> - notify FAILED (during migration_cleanup)
>
> So patch 2 fixes the duplicate issue, and only fixes that.
>
> After patch 3
> =============
>
> - notify FAILED (during migration_iteration_finish)
> - vm_start()
>
> Patch 3 changes the place of FAILED notification so that it happens always
> before vm_start(), for both precopy and postcopy.
Right, my point is that with patch 3 we're establishing that the correct
place to notify is before vm_start(). But after patch 2, *if* any driver
actually depends on being informed of failure *before* starting the VM,
that will not happen. I think both changes could be made at once so that
this intermediate state never exists.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/5] migration: Notify migration FAILED before starting VM
2026-01-23 17:36 ` Fabiano Rosas
@ 2026-01-26 15:21 ` Peter Xu
2026-01-26 19:20 ` Fabiano Rosas
0 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2026-01-26 15:21 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Juraj Marcin, Stefan Hajnoczi, Prasad Pandit,
Cédric Le Goater, Marc-André Lureau
On Fri, Jan 23, 2026 at 02:36:28PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > On Fri, Jan 23, 2026 at 09:59:35AM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >>
> >> > Devices may opt-in migration FAILED notifiers to be invoked when migration
> >> > fails. Currently, the notifications happen in migration_cleanup(). It is
> >> > normally fine, but maybe not ideal if there's dependency of the fallback
> >> > v.s. VM starts.
> >> >
> >> > This patch moves the FAILED notification earlier, so that if the failure
> >> > happened during switchover, it'll notify before VM restart.
> >> >
> >>
> >> The change to FAILED in patch 2 should come to this patch to avoid
> >> having a window where the notification only happens at the end.
> >
> > Hmm.. Isn't that expected? Even after patch 2, we still notify FAILED at
> > the end for precopy. It's the same for postcopy.
> >
>
> Sorry, I meant: s/at the end/after vm_start/.
>
> > For a failed postcopy we have following behavior:
> >
> > Before patch 2
> > ==============
> >
> > - notify FAILED (during switchover)
> > - vm_start()
> > - notify FAILED (during migration_cleanup)
> >
> > After patch 2
> > =============
> >
> > - vm_start()
> > - notify FAILED (during migration_cleanup)
> >
> > So patch 2 fixes the duplicate issue, and only fixes that.
> >
> > After patch 3
> > =============
> >
> > - notify FAILED (during migration_iteration_finish)
> > - vm_start()
> >
> > Patch 3 changes the place of FAILED notification so that it happens always
> > before vm_start(), for both precopy and postcopy.
>
> Right, my point is that with patch 3 we're establishing that the correct
> place to notify is before vm_start().
Yep, likely not strictly correctness in terms of current notifiers, but
since Stefan may have yet another use case that may require a notifier to
be done before vm_start(), it makes more sense for us to move, IMHO.
> But after patch 2, *if* any driver actually depends on being informed of
> failure *before* starting the VM, that will not happen. I think both
> changes could be made at once so that this intermediate state never
> exists.
I see what you meant. I think there should have no such user.
It's because we always notify FAILED at migration_cleanup() for precopy, or
even postcopy before the cpr-exec work (before QEMU 9.0).
That behavior of "notify FAILED before vm_start() for postcopy" is very
specific and only added after commit 4af667f87c ("migration: notifier error
checking"). IOW, before QEMU 9.0, for both precopy and postcopy we always
notify FAILED in migration_cleanup(), never before vm_start().
I mentioned this in the commit log of previous patch too, where I bet the
additional FAILED notification added in 4af667f87c for postcopy path is an
accident (to make it pairing with the "reused DONE", however it turns out
we likely shouldn't do either of them..). So I don't expect anything will
depend on that behavior, and only for postcopy.
The benefit of splitting this patch and previous one is, the previous one
is a "fix" of duplicated notifications, hence if we need a backport that
can be done without this one. Said that, I don't think one should need
it.. It should also make each commits slightly easier to follow, because
they're fundamentally two changes.
Let me know what you think after reading my explanations above. I prefer
the split like as-is, but I can still squash it to close the trivially
small window that you described. I'll make sure if merged the commit
message contains separate discussions on two problems.
--
Peter Xu
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/5] migration: Notifier fixes for 11.0
2026-01-22 23:03 [PATCH 0/5] migration: Notifier fixes for 11.0 Peter Xu
` (4 preceding siblings ...)
2026-01-22 23:03 ` [PATCH 5/5] migration: Rename MIG_EVENT_PRECOPY_* to MIG_EVENT_* Peter Xu
@ 2026-01-26 15:58 ` Stefan Hajnoczi
5 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2026-01-26 15:58 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Juraj Marcin, Fabiano Rosas, Prasad Pandit
[-- Attachment #1: Type: text/plain, Size: 1902 bytes --]
On Thu, Jan 22, 2026 at 06:03:26PM -0500, Peter Xu wrote:
> CI: https://gitlab.com/peterx/qemu/-/pipelines/2280356561
>
> Two major goals for this small series:
>
> - Fix postcopy issue where DONE and FAILED notifiers will be invoked twice
>
> - Move FAILED notifier to be before vm_start() if the failure happens
> during switchover (where we will stop the VM first)
>
> The 2nd goal will be needed by Stefan's ongoing work on block persistent
> reservations, where a fallback should be required on src to happen before
> vm_start(). Instead of introducing another FAILED_BEFORE_START, this
> patchset should make FAILED work instead.
>
> Patch 1 adds a tracepoint for me to verify this fix.
>
> Patch 2-3 are the real changes of above two.
>
> Patch 3-4 are some cleanups alone the context that we can do, hence
> attached at the end.
>
> More details in commit logs individually. Comments welcomed, thanks.
>
> Peter Xu (5):
> migration: Add a tracepoint for invoking migration notifiers
> migration: Fix double notification of DONE/FAIL for postcopy
> migration: Notify migration FAILED before starting VM
> migration: Drop explicit block activation in postcopy fail path
> migration: Rename MIG_EVENT_PRECOPY_* to MIG_EVENT_*
>
> include/migration/misc.h | 15 ++++++++-------
> hw/intc/arm_gicv3_kvm.c | 2 +-
> hw/net/virtio-net.c | 4 ++--
> hw/vfio/cpr-legacy.c | 2 +-
> hw/vfio/cpr.c | 8 ++++----
> hw/vfio/migration.c | 4 ++--
> migration/cpr-exec.c | 6 +++---
> migration/migration.c | 30 +++++++++++++++++++++---------
> net/vhost-vdpa.c | 4 ++--
> ui/spice-core.c | 7 ++++---
> migration/trace-events | 1 +
> 11 files changed, 49 insertions(+), 34 deletions(-)
>
> --
> 2.50.1
>
Thank you!
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/5] migration: Notify migration FAILED before starting VM
2026-01-26 15:21 ` Peter Xu
@ 2026-01-26 19:20 ` Fabiano Rosas
0 siblings, 0 replies; 18+ messages in thread
From: Fabiano Rosas @ 2026-01-26 19:20 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Juraj Marcin, Stefan Hajnoczi, Prasad Pandit,
Cédric Le Goater, Marc-André Lureau
Peter Xu <peterx@redhat.com> writes:
> On Fri, Jan 23, 2026 at 02:36:28PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > On Fri, Jan 23, 2026 at 09:59:35AM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >>
>> >> > Devices may opt-in migration FAILED notifiers to be invoked when migration
>> >> > fails. Currently, the notifications happen in migration_cleanup(). It is
>> >> > normally fine, but maybe not ideal if there's dependency of the fallback
>> >> > v.s. VM starts.
>> >> >
>> >> > This patch moves the FAILED notification earlier, so that if the failure
>> >> > happened during switchover, it'll notify before VM restart.
>> >> >
>> >>
>> >> The change to FAILED in patch 2 should come to this patch to avoid
>> >> having a window where the notification only happens at the end.
>> >
>> > Hmm.. Isn't that expected? Even after patch 2, we still notify FAILED at
>> > the end for precopy. It's the same for postcopy.
>> >
>>
>> Sorry, I meant: s/at the end/after vm_start/.
>>
>> > For a failed postcopy we have following behavior:
>> >
>> > Before patch 2
>> > ==============
>> >
>> > - notify FAILED (during switchover)
>> > - vm_start()
>> > - notify FAILED (during migration_cleanup)
>> >
>> > After patch 2
>> > =============
>> >
>> > - vm_start()
>> > - notify FAILED (during migration_cleanup)
>> >
>> > So patch 2 fixes the duplicate issue, and only fixes that.
>> >
>> > After patch 3
>> > =============
>> >
>> > - notify FAILED (during migration_iteration_finish)
>> > - vm_start()
>> >
>> > Patch 3 changes the place of FAILED notification so that it happens always
>> > before vm_start(), for both precopy and postcopy.
>>
>> Right, my point is that with patch 3 we're establishing that the correct
>> place to notify is before vm_start().
>
> Yep, likely not strictly correctness in terms of current notifiers, but
> since Stefan may have yet another use case that may require a notifier to
> be done before vm_start(), it makes more sense for us to move, IMHO.
>
>> But after patch 2, *if* any driver actually depends on being informed of
>> failure *before* starting the VM, that will not happen. I think both
>> changes could be made at once so that this intermediate state never
>> exists.
>
> I see what you meant. I think there should have no such user.
>
> It's because we always notify FAILED at migration_cleanup() for precopy, or
> even postcopy before the cpr-exec work (before QEMU 9.0).
>
> That behavior of "notify FAILED before vm_start() for postcopy" is very
> specific and only added after commit 4af667f87c ("migration: notifier error
> checking"). IOW, before QEMU 9.0, for both precopy and postcopy we always
> notify FAILED in migration_cleanup(), never before vm_start().
>
> I mentioned this in the commit log of previous patch too, where I bet the
> additional FAILED notification added in 4af667f87c for postcopy path is an
> accident (to make it pairing with the "reused DONE", however it turns out
> we likely shouldn't do either of them..). So I don't expect anything will
> depend on that behavior, and only for postcopy.
>
> The benefit of splitting this patch and previous one is, the previous one
> is a "fix" of duplicated notifications, hence if we need a backport that
> can be done without this one. Said that, I don't think one should need
> it.. It should also make each commits slightly easier to follow, because
> they're fundamentally two changes.
>
> Let me know what you think after reading my explanations above. I prefer
> the split like as-is, but I can still squash it to close the trivially
> small window that you described. I'll make sure if merged the commit
> message contains separate discussions on two problems.
Ok, fair points.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-01-26 19:20 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-22 23:03 [PATCH 0/5] migration: Notifier fixes for 11.0 Peter Xu
2026-01-22 23:03 ` [PATCH 1/5] migration: Add a tracepoint for invoking migration notifiers Peter Xu
2026-01-23 12:25 ` Fabiano Rosas
2026-01-22 23:03 ` [PATCH 2/5] migration: Fix double notification of DONE/FAIL for postcopy Peter Xu
2026-01-23 12:52 ` Fabiano Rosas
2026-01-23 12:54 ` Fabiano Rosas
2026-01-23 14:58 ` Peter Xu
2026-01-22 23:03 ` [PATCH 3/5] migration: Notify migration FAILED before starting VM Peter Xu
2026-01-23 12:59 ` Fabiano Rosas
2026-01-23 15:40 ` Peter Xu
2026-01-23 17:36 ` Fabiano Rosas
2026-01-26 15:21 ` Peter Xu
2026-01-26 19:20 ` Fabiano Rosas
2026-01-22 23:03 ` [PATCH 4/5] migration: Drop explicit block activation in postcopy fail path Peter Xu
2026-01-23 12:59 ` Fabiano Rosas
2026-01-22 23:03 ` [PATCH 5/5] migration: Rename MIG_EVENT_PRECOPY_* to MIG_EVENT_* Peter Xu
2026-01-23 13:02 ` Fabiano Rosas
2026-01-26 15:58 ` [PATCH 0/5] migration: Notifier fixes for 11.0 Stefan Hajnoczi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.