All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] migration: Fix regressions
@ 2024-12-13 16:01 Fabiano Rosas
  2024-12-13 16:01 ` [PATCH 1/2] migration/multifd: Fix compat with QEMU < 9.0 Fabiano Rosas
  2024-12-13 16:01 ` [PATCH 2/2] s390x: Fix CSS migration Fabiano Rosas
  0 siblings, 2 replies; 5+ messages in thread
From: Fabiano Rosas @ 2024-12-13 16:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu

Hello,

A couple of fixes for regressions we have open in gitlab:

multifd from old QEMUs - https://gitlab.com/qemu-project/qemu/-/issues/2720
s390x CSS - https://gitlab.com/qemu-project/qemu/-/issues/2704

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1587733433

Fabiano Rosas (2):
  migration/multifd: Fix compat with QEMU < 9.0
  s390x: Fix CSS migration

 hw/s390x/s390-virtio-ccw.c |  2 +-
 migration/multifd.c        | 15 +++++++++------
 2 files changed, 10 insertions(+), 7 deletions(-)

-- 
2.35.3



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] migration/multifd: Fix compat with QEMU < 9.0
  2024-12-13 16:01 [PATCH 0/2] migration: Fix regressions Fabiano Rosas
@ 2024-12-13 16:01 ` Fabiano Rosas
  2024-12-16 16:02   ` Peter Xu
  2024-12-13 16:01 ` [PATCH 2/2] s390x: Fix CSS migration Fabiano Rosas
  1 sibling, 1 reply; 5+ messages in thread
From: Fabiano Rosas @ 2024-12-13 16:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, qemu-stable

Commit f5f48a7891 ("migration/multifd: Separate SYNC request with
normal jobs") changed the multifd source side to stop sending data
along with the MULTIFD_FLAG_SYNC, effectively introducing the concept
of a SYNC-only packet. Relying on that, commit d7e58f412c
("migration/multifd: Don't send ram data during SYNC") later came
along and skipped reading data from SYNC packets.

In a versions timeline like this:

  8.2 f5f48a7 9.0 9.1 d7e58f41 9.2

The issue arises that QEMUs < 9.0 still send data along with SYNC, but
QEMUs > 9.1 don't gather that data anymore. This leads to various
kinds of migration failures due to desync/missing data.

Stop checking for a SYNC packet on the destination and unconditionally
unfill the packet.

From now on:

old -> new:
the source sends data + sync, destination reads normally

new -> new:
source sends only sync, destination reads zeros

new -> old:
source sends only sync, destination reads zeros

CC: qemu-stable@nongnu.org
Fixes: d7e58f412c ("migration/multifd: Don't send ram data during SYNC")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2720
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 498e71fd10..8d0a763a72 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -252,9 +252,8 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
     p->packet_num = be64_to_cpu(packet->packet_num);
     p->packets_recved++;
 
-    if (!(p->flags & MULTIFD_FLAG_SYNC)) {
-        ret = multifd_ram_unfill_packet(p, errp);
-    }
+    /* Always unfill, old QEMUs (<9.0) send data along with SYNC */
+    ret = multifd_ram_unfill_packet(p, errp);
 
     trace_multifd_recv_unfill(p->id, p->packet_num, p->flags,
                               p->next_packet_size);
@@ -1151,9 +1150,13 @@ static void *multifd_recv_thread(void *opaque)
             flags = p->flags;
             /* recv methods don't know how to handle the SYNC flag */
             p->flags &= ~MULTIFD_FLAG_SYNC;
-            if (!(flags & MULTIFD_FLAG_SYNC)) {
-                has_data = p->normal_num || p->zero_num;
-            }
+
+            /*
+             * Even if it's a SYNC packet, this needs to be set
+             * because older QEMUs (<9.0) still send data along with
+             * the SYNC packet.
+             */
+            has_data = p->normal_num || p->zero_num;
             qemu_mutex_unlock(&p->mutex);
         } else {
             /*
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] s390x: Fix CSS migration
  2024-12-13 16:01 [PATCH 0/2] migration: Fix regressions Fabiano Rosas
  2024-12-13 16:01 ` [PATCH 1/2] migration/multifd: Fix compat with QEMU < 9.0 Fabiano Rosas
@ 2024-12-13 16:01 ` Fabiano Rosas
  2024-12-16  7:17   ` Thomas Huth
  1 sibling, 1 reply; 5+ messages in thread
From: Fabiano Rosas @ 2024-12-13 16:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Paolo Bonzini, qemu-stable, Richard Henderson,
	David Hildenbrand, Ilya Leoshkevich, Halil Pasic,
	Christian Borntraeger, Eric Farman, Thomas Huth

Commit a55ae46683 ("s390: move css_migration_enabled from machine to
css.c") disabled CSS migration globally instead of doing it
per-instance.

CC: Paolo Bonzini <pbonzini@redhat.com>
CC: qemu-stable@nongnu.org #9.1
Fixes: a55ae46683 ("s390: move css_migration_enabled from machine to css.c")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2704
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
Consider enabling migration-compat-s390x in CI, it would have caught
this. It's too late for this development cycle because the backward
migration is broken, but in 10.1 both sides will agree on the
css_migration_enabled value.
---
 hw/s390x/s390-virtio-ccw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index fe03f716f3..701b2d4e2e 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -1189,6 +1189,7 @@ static void ccw_machine_2_9_instance_options(MachineState *machine)
     s390_cpudef_featoff_greater(12, 1, S390_FEAT_ZPCI);
     s390_cpudef_featoff_greater(12, 1, S390_FEAT_ADAPTER_INT_SUPPRESSION);
     s390_cpudef_featoff_greater(12, 1, S390_FEAT_ADAPTER_EVENT_NOTIFICATION);
+    css_migration_enabled = false;
 }
 
 static void ccw_machine_2_9_class_options(MachineClass *mc)
@@ -1201,7 +1202,6 @@ static void ccw_machine_2_9_class_options(MachineClass *mc)
     ccw_machine_2_10_class_options(mc);
     compat_props_add(mc->compat_props, hw_compat_2_9, hw_compat_2_9_len);
     compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat));
-    css_migration_enabled = false;
 }
 DEFINE_CCW_MACHINE(2, 9);
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] s390x: Fix CSS migration
  2024-12-13 16:01 ` [PATCH 2/2] s390x: Fix CSS migration Fabiano Rosas
@ 2024-12-16  7:17   ` Thomas Huth
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Huth @ 2024-12-16  7:17 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel
  Cc: Peter Xu, Paolo Bonzini, qemu-stable, Richard Henderson,
	David Hildenbrand, Ilya Leoshkevich, Halil Pasic,
	Christian Borntraeger, Eric Farman

On 13/12/2024 17.01, Fabiano Rosas wrote:
> Commit a55ae46683 ("s390: move css_migration_enabled from machine to
> css.c") disabled CSS migration globally instead of doing it
> per-instance.
> 
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: qemu-stable@nongnu.org #9.1
> Fixes: a55ae46683 ("s390: move css_migration_enabled from machine to css.c")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2704
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> Consider enabling migration-compat-s390x in CI, it would have caught
> this. It's too late for this development cycle because the backward
> migration is broken, but in 10.1 both sides will agree on the
> css_migration_enabled value.
> ---
>   hw/s390x/s390-virtio-ccw.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index fe03f716f3..701b2d4e2e 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -1189,6 +1189,7 @@ static void ccw_machine_2_9_instance_options(MachineState *machine)
>       s390_cpudef_featoff_greater(12, 1, S390_FEAT_ZPCI);
>       s390_cpudef_featoff_greater(12, 1, S390_FEAT_ADAPTER_INT_SUPPRESSION);
>       s390_cpudef_featoff_greater(12, 1, S390_FEAT_ADAPTER_EVENT_NOTIFICATION);
> +    css_migration_enabled = false;
>   }
>   
>   static void ccw_machine_2_9_class_options(MachineClass *mc)
> @@ -1201,7 +1202,6 @@ static void ccw_machine_2_9_class_options(MachineClass *mc)
>       ccw_machine_2_10_class_options(mc);
>       compat_props_add(mc->compat_props, hw_compat_2_9, hw_compat_2_9_len);
>       compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat));
> -    css_migration_enabled = false;
>   }
>   DEFINE_CCW_MACHINE(2, 9);
>   

D'oh!

Reviewed-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] migration/multifd: Fix compat with QEMU < 9.0
  2024-12-13 16:01 ` [PATCH 1/2] migration/multifd: Fix compat with QEMU < 9.0 Fabiano Rosas
@ 2024-12-16 16:02   ` Peter Xu
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Xu @ 2024-12-16 16:02 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, qemu-stable

On Fri, Dec 13, 2024 at 01:01:19PM -0300, Fabiano Rosas wrote:
> Commit f5f48a7891 ("migration/multifd: Separate SYNC request with
> normal jobs") changed the multifd source side to stop sending data
> along with the MULTIFD_FLAG_SYNC, effectively introducing the concept
> of a SYNC-only packet. Relying on that, commit d7e58f412c
> ("migration/multifd: Don't send ram data during SYNC") later came
> along and skipped reading data from SYNC packets.
> 
> In a versions timeline like this:
> 
>   8.2 f5f48a7 9.0 9.1 d7e58f41 9.2
> 
> The issue arises that QEMUs < 9.0 still send data along with SYNC, but
> QEMUs > 9.1 don't gather that data anymore. This leads to various
> kinds of migration failures due to desync/missing data.
> 
> Stop checking for a SYNC packet on the destination and unconditionally
> unfill the packet.
> 
> From now on:
> 
> old -> new:
> the source sends data + sync, destination reads normally
> 
> new -> new:
> source sends only sync, destination reads zeros
> 
> new -> old:
> source sends only sync, destination reads zeros
> 
> CC: qemu-stable@nongnu.org
> Fixes: d7e58f412c ("migration/multifd: Don't send ram data during SYNC")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2720
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-12-16 16:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-13 16:01 [PATCH 0/2] migration: Fix regressions Fabiano Rosas
2024-12-13 16:01 ` [PATCH 1/2] migration/multifd: Fix compat with QEMU < 9.0 Fabiano Rosas
2024-12-16 16:02   ` Peter Xu
2024-12-13 16:01 ` [PATCH 2/2] s390x: Fix CSS migration Fabiano Rosas
2024-12-16  7:17   ` Thomas Huth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.