All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] migration: Fix possible division by zero on calc expected downtime
@ 2026-05-11 14:10 Peter Xu
  2026-05-11 14:26 ` Peter Maydell
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Xu @ 2026-05-11 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Juraj Marcin, Peter Maydell

Commit dd4fe8844b changed the reporting of expected downtime behavior, so
that the value will be calculated on-demand.  One side effect on the change
is QEMU will allow the calculation to happen anytime even if there's no
transfer happening for a short while.

PeterM reported an ubsan report from clang when running migration-test with
aarch64 binary on x86_64 hosts.  I can also reproduce if I run the test
concurrently so some of the src QEMU may not get chance to push any data,
causing mbps to be 0:

../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'

Fix it by checking mbps non-zero before the devision.

Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b6f78eb3ac..b06e577328 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1044,12 +1044,18 @@ static bool migrate_show_downtime(MigrationState *s)
 /* Return expected downtime (unit: milliseconds) */
 int64_t migration_downtime_calc_expected(MigrationState *s)
 {
+    double bw;
+
     if (mig_stats.dirty_sync_count <= 1) {
         return migrate_downtime_limit();
     }
 
-    return mig_stats.dirty_bytes_last_sync /
-        migration_get_switchover_bw(s) * 1000;
+    bw = migration_get_switchover_bw(s) * 1000;
+    if (!bw) {
+        return INT64_MAX;
+    }
+
+    return mig_stats.dirty_bytes_last_sync / bw;
 }
 
 static void populate_time_info(MigrationInfo *info, MigrationState *s)
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] migration: Fix possible division by zero on calc expected downtime
  2026-05-11 14:10 [PATCH] migration: Fix possible division by zero on calc expected downtime Peter Xu
@ 2026-05-11 14:26 ` Peter Maydell
  2026-05-11 14:51   ` Peter Xu
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2026-05-11 14:26 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas, Juraj Marcin

On Mon, 11 May 2026 at 15:10, Peter Xu <peterx@redhat.com> wrote:
>
> Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> that the value will be calculated on-demand.  One side effect on the change
> is QEMU will allow the calculation to happen anytime even if there's no
> transfer happening for a short while.
>
> PeterM reported an ubsan report from clang when running migration-test with
> aarch64 binary on x86_64 hosts.  I can also reproduce if I run the test
> concurrently so some of the src QEMU may not get chance to push any data,
> causing mbps to be 0:
>
> ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
>
> Fix it by checking mbps non-zero before the devision.
>
> Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
> Reported-by: Peter Maydell <peter.maydell@linaro.org>
> Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index b6f78eb3ac..b06e577328 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1044,12 +1044,18 @@ static bool migrate_show_downtime(MigrationState *s)
>  /* Return expected downtime (unit: milliseconds) */
>  int64_t migration_downtime_calc_expected(MigrationState *s)
>  {
> +    double bw;
> +
>      if (mig_stats.dirty_sync_count <= 1) {
>          return migrate_downtime_limit();
>      }
>
> -    return mig_stats.dirty_bytes_last_sync /
> -        migration_get_switchover_bw(s) * 1000;
> +    bw = migration_get_switchover_bw(s) * 1000;
> +    if (!bw) {
> +        return INT64_MAX;
> +    }
> +
> +    return mig_stats.dirty_bytes_last_sync / bw;

I think this is still UB if bw is a value that's nonzero
but still small enough that mig_stats.dirty_bytes_last_sync / bw
is greater than INT64_MAX. It might be more robust to do
something like:

    double expected = mig_stats.dirty_bytes_last_sync /
        migration_get_switchover_bw(s) * 1000;

    if (expected >= (double)INT64_MAX) {
        return INT64_MAX;
    }
    return (int64_t)expected;

(which will also handle the infinity case, because
in IEEE fp infinity is larger than any finite value).

-- PMM


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] migration: Fix possible division by zero on calc expected downtime
  2026-05-11 14:26 ` Peter Maydell
@ 2026-05-11 14:51   ` Peter Xu
  2026-05-11 15:18     ` Peter Xu
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Xu @ 2026-05-11 14:51 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, Fabiano Rosas, Juraj Marcin

On Mon, May 11, 2026 at 03:26:06PM +0100, Peter Maydell wrote:
> On Mon, 11 May 2026 at 15:10, Peter Xu <peterx@redhat.com> wrote:
> >
> > Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> > that the value will be calculated on-demand.  One side effect on the change
> > is QEMU will allow the calculation to happen anytime even if there's no
> > transfer happening for a short while.
> >
> > PeterM reported an ubsan report from clang when running migration-test with
> > aarch64 binary on x86_64 hosts.  I can also reproduce if I run the test
> > concurrently so some of the src QEMU may not get chance to push any data,
> > causing mbps to be 0:
> >
> > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
> >
> > Fix it by checking mbps non-zero before the devision.
> >
> > Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
> > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index b6f78eb3ac..b06e577328 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1044,12 +1044,18 @@ static bool migrate_show_downtime(MigrationState *s)
> >  /* Return expected downtime (unit: milliseconds) */
> >  int64_t migration_downtime_calc_expected(MigrationState *s)
> >  {
> > +    double bw;
> > +
> >      if (mig_stats.dirty_sync_count <= 1) {
> >          return migrate_downtime_limit();
> >      }
> >
> > -    return mig_stats.dirty_bytes_last_sync /
> > -        migration_get_switchover_bw(s) * 1000;
> > +    bw = migration_get_switchover_bw(s) * 1000;
> > +    if (!bw) {
> > +        return INT64_MAX;
> > +    }
> > +
> > +    return mig_stats.dirty_bytes_last_sync / bw;
> 
> I think this is still UB if bw is a value that's nonzero
> but still small enough that mig_stats.dirty_bytes_last_sync / bw
> is greater than INT64_MAX. It might be more robust to do
> something like:
> 
>     double expected = mig_stats.dirty_bytes_last_sync /
>         migration_get_switchover_bw(s) * 1000;
> 
>     if (expected >= (double)INT64_MAX) {
>         return INT64_MAX;
>     }
>     return (int64_t)expected;
> 
> (which will also handle the infinity case, because
> in IEEE fp infinity is larger than any finite value).

I almost treated mbps as integer, because when it's non-zero it almost
should be at least ~10byte/s (our unit is (uint)transferred / 100ms).. so I
doubt if the issue can happen. But indeed it's theoretically flawed.

I'll repost, thanks for the careful review, Peter.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] migration: Fix possible division by zero on calc expected downtime
  2026-05-11 14:51   ` Peter Xu
@ 2026-05-11 15:18     ` Peter Xu
  2026-05-11 15:24       ` Peter Maydell
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Xu @ 2026-05-11 15:18 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, Fabiano Rosas, Juraj Marcin

On Mon, May 11, 2026 at 10:51:00AM -0400, Peter Xu wrote:
> On Mon, May 11, 2026 at 03:26:06PM +0100, Peter Maydell wrote:
> > On Mon, 11 May 2026 at 15:10, Peter Xu <peterx@redhat.com> wrote:
> > >
> > > Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> > > that the value will be calculated on-demand.  One side effect on the change
> > > is QEMU will allow the calculation to happen anytime even if there's no
> > > transfer happening for a short while.
> > >
> > > PeterM reported an ubsan report from clang when running migration-test with
> > > aarch64 binary on x86_64 hosts.  I can also reproduce if I run the test
> > > concurrently so some of the src QEMU may not get chance to push any data,
> > > causing mbps to be 0:
> > >
> > > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
> > >
> > > Fix it by checking mbps non-zero before the devision.
> > >
> > > Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
> > > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  migration/migration.c | 10 ++++++++--
> > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index b6f78eb3ac..b06e577328 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -1044,12 +1044,18 @@ static bool migrate_show_downtime(MigrationState *s)
> > >  /* Return expected downtime (unit: milliseconds) */
> > >  int64_t migration_downtime_calc_expected(MigrationState *s)
> > >  {
> > > +    double bw;
> > > +
> > >      if (mig_stats.dirty_sync_count <= 1) {
> > >          return migrate_downtime_limit();
> > >      }
> > >
> > > -    return mig_stats.dirty_bytes_last_sync /
> > > -        migration_get_switchover_bw(s) * 1000;
> > > +    bw = migration_get_switchover_bw(s) * 1000;

PS: I think I got the math wrong, not that it matters anymore.. :(

> > > +    if (!bw) {
> > > +        return INT64_MAX;
> > > +    }
> > > +
> > > +    return mig_stats.dirty_bytes_last_sync / bw;
> > 
> > I think this is still UB if bw is a value that's nonzero
> > but still small enough that mig_stats.dirty_bytes_last_sync / bw
> > is greater than INT64_MAX. It might be more robust to do
> > something like:
> > 
> >     double expected = mig_stats.dirty_bytes_last_sync /
> >         migration_get_switchover_bw(s) * 1000;
> > 
> >     if (expected >= (double)INT64_MAX) {
> >         return INT64_MAX;
> >     }

This actually fails to fix the ubsan issue and I can still reproduce,
because the ubsan issue was a Nan reported, where it seems when Nan found
in normal such evaluations it'll return FALSE.  I'll need to use:

        if (expected < (double)INT64_MAX) {
            return (int64_t)expected;
        }

I'll add a rich comment when repost.

> >     return (int64_t)expected;
> > 
> > (which will also handle the infinity case, because
> > in IEEE fp infinity is larger than any finite value).
> 
> I almost treated mbps as integer, because when it's non-zero it almost
> should be at least ~10byte/s (our unit is (uint)transferred / 100ms).. so I
> doubt if the issue can happen. But indeed it's theoretically flawed.
> 
> I'll repost, thanks for the careful review, Peter.
> 
> -- 
> Peter Xu

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] migration: Fix possible division by zero on calc expected downtime
  2026-05-11 15:18     ` Peter Xu
@ 2026-05-11 15:24       ` Peter Maydell
  2026-05-11 15:29         ` Peter Xu
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2026-05-11 15:24 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas, Juraj Marcin

On Mon, 11 May 2026 at 16:18, Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, May 11, 2026 at 10:51:00AM -0400, Peter Xu wrote:
> > On Mon, May 11, 2026 at 03:26:06PM +0100, Peter Maydell wrote:
> > > On Mon, 11 May 2026 at 15:10, Peter Xu <peterx@redhat.com> wrote:
> > > >
> > > > Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> > > > that the value will be calculated on-demand.  One side effect on the change
> > > > is QEMU will allow the calculation to happen anytime even if there's no
> > > > transfer happening for a short while.
> > > >
> > > > PeterM reported an ubsan report from clang when running migration-test with
> > > > aarch64 binary on x86_64 hosts.  I can also reproduce if I run the test
> > > > concurrently so some of the src QEMU may not get chance to push any data,
> > > > causing mbps to be 0:
> > > >
> > > > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
> > > >
> > > > Fix it by checking mbps non-zero before the devision.
> > > >
> > > > Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
> > > > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > > > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > >  migration/migration.c | 10 ++++++++--
> > > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > index b6f78eb3ac..b06e577328 100644
> > > > --- a/migration/migration.c
> > > > +++ b/migration/migration.c
> > > > @@ -1044,12 +1044,18 @@ static bool migrate_show_downtime(MigrationState *s)
> > > >  /* Return expected downtime (unit: milliseconds) */
> > > >  int64_t migration_downtime_calc_expected(MigrationState *s)
> > > >  {
> > > > +    double bw;
> > > > +
> > > >      if (mig_stats.dirty_sync_count <= 1) {
> > > >          return migrate_downtime_limit();
> > > >      }
> > > >
> > > > -    return mig_stats.dirty_bytes_last_sync /
> > > > -        migration_get_switchover_bw(s) * 1000;
> > > > +    bw = migration_get_switchover_bw(s) * 1000;
>
> PS: I think I got the math wrong, not that it matters anymore.. :(
>
> > > > +    if (!bw) {
> > > > +        return INT64_MAX;
> > > > +    }
> > > > +
> > > > +    return mig_stats.dirty_bytes_last_sync / bw;
> > >
> > > I think this is still UB if bw is a value that's nonzero
> > > but still small enough that mig_stats.dirty_bytes_last_sync / bw
> > > is greater than INT64_MAX. It might be more robust to do
> > > something like:
> > >
> > >     double expected = mig_stats.dirty_bytes_last_sync /
> > >         migration_get_switchover_bw(s) * 1000;
> > >
> > >     if (expected >= (double)INT64_MAX) {
> > >         return INT64_MAX;
> > >     }
>
> This actually fails to fix the ubsan issue and I can still reproduce,
> because the ubsan issue was a Nan reported, where it seems when Nan found
> in normal such evaluations it'll return FALSE.  I'll need to use:
>
>         if (expected < (double)INT64_MAX) {
>             return (int64_t)expected;
>         }
>

Hmm, how do we get a NaN ?

-- PMM


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] migration: Fix possible division by zero on calc expected downtime
  2026-05-11 15:24       ` Peter Maydell
@ 2026-05-11 15:29         ` Peter Xu
  2026-05-11 15:36           ` Peter Maydell
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Xu @ 2026-05-11 15:29 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, Fabiano Rosas, Juraj Marcin

On Mon, May 11, 2026 at 04:24:24PM +0100, Peter Maydell wrote:
> > This actually fails to fix the ubsan issue and I can still reproduce,
> > because the ubsan issue was a Nan reported, where it seems when Nan found
> > in normal such evaluations it'll return FALSE.  I'll need to use:
> >
> >         if (expected < (double)INT64_MAX) {
> >             return (int64_t)expected;
> >         }
> >
> 
> Hmm, how do we get a NaN ?

When mbps is zero (when it reproduces here).  I have the comment in v2
explaining it.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] migration: Fix possible division by zero on calc expected downtime
  2026-05-11 15:29         ` Peter Xu
@ 2026-05-11 15:36           ` Peter Maydell
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Maydell @ 2026-05-11 15:36 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas, Juraj Marcin

On Mon, 11 May 2026 at 16:30, Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, May 11, 2026 at 04:24:24PM +0100, Peter Maydell wrote:
> > > This actually fails to fix the ubsan issue and I can still reproduce,
> > > because the ubsan issue was a Nan reported, where it seems when Nan found
> > > in normal such evaluations it'll return FALSE.  I'll need to use:
> > >
> > >         if (expected < (double)INT64_MAX) {
> > >             return (int64_t)expected;
> > >         }
> > >
> >
> > Hmm, how do we get a NaN ?
>
> When mbps is zero (when it reproduces here).  I have the comment in v2
> explaining it.

Oh, I think I see -- we get the NaN if both s->mbps == 0 and
also mig_stats.dirty_bytes_last_sync == 0, because that's 0 / 0,
which is NaN. If only s->mbps == 0 but dirty_bytes_last_sync is
not zero, then we get +Inf.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-05-11 15:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 14:10 [PATCH] migration: Fix possible division by zero on calc expected downtime Peter Xu
2026-05-11 14:26 ` Peter Maydell
2026-05-11 14:51   ` Peter Xu
2026-05-11 15:18     ` Peter Xu
2026-05-11 15:24       ` Peter Maydell
2026-05-11 15:29         ` Peter Xu
2026-05-11 15:36           ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.