* [PATCH] migration: Fix possible division by zero on calc expected downtime
@ 2026-05-11 14:10 Peter Xu
2026-05-11 14:26 ` Peter Maydell
0 siblings, 1 reply; 7+ messages in thread
From: Peter Xu @ 2026-05-11 14:10 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Juraj Marcin, Peter Maydell
Commit dd4fe8844b changed the reporting of expected downtime behavior, so
that the value will be calculated on-demand. One side effect on the change
is QEMU will allow the calculation to happen anytime even if there's no
transfer happening for a short while.
PeterM reported an ubsan report from clang when running migration-test with
aarch64 binary on x86_64 hosts. I can also reproduce if I run the test
concurrently so some of the src QEMU may not get chance to push any data,
causing mbps to be 0:
../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
Fix it by checking mbps non-zero before the devision.
Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index b6f78eb3ac..b06e577328 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1044,12 +1044,18 @@ static bool migrate_show_downtime(MigrationState *s)
/* Return expected downtime (unit: milliseconds) */
int64_t migration_downtime_calc_expected(MigrationState *s)
{
+ double bw;
+
if (mig_stats.dirty_sync_count <= 1) {
return migrate_downtime_limit();
}
- return mig_stats.dirty_bytes_last_sync /
- migration_get_switchover_bw(s) * 1000;
+ bw = migration_get_switchover_bw(s) * 1000;
+ if (!bw) {
+ return INT64_MAX;
+ }
+
+ return mig_stats.dirty_bytes_last_sync / bw;
}
static void populate_time_info(MigrationInfo *info, MigrationState *s)
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] migration: Fix possible division by zero on calc expected downtime
2026-05-11 14:10 [PATCH] migration: Fix possible division by zero on calc expected downtime Peter Xu
@ 2026-05-11 14:26 ` Peter Maydell
2026-05-11 14:51 ` Peter Xu
0 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2026-05-11 14:26 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas, Juraj Marcin
On Mon, 11 May 2026 at 15:10, Peter Xu <peterx@redhat.com> wrote:
>
> Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> that the value will be calculated on-demand. One side effect on the change
> is QEMU will allow the calculation to happen anytime even if there's no
> transfer happening for a short while.
>
> PeterM reported an ubsan report from clang when running migration-test with
> aarch64 binary on x86_64 hosts. I can also reproduce if I run the test
> concurrently so some of the src QEMU may not get chance to push any data,
> causing mbps to be 0:
>
> ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
>
> Fix it by checking mbps non-zero before the devision.
>
> Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
> Reported-by: Peter Maydell <peter.maydell@linaro.org>
> Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> migration/migration.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index b6f78eb3ac..b06e577328 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1044,12 +1044,18 @@ static bool migrate_show_downtime(MigrationState *s)
> /* Return expected downtime (unit: milliseconds) */
> int64_t migration_downtime_calc_expected(MigrationState *s)
> {
> + double bw;
> +
> if (mig_stats.dirty_sync_count <= 1) {
> return migrate_downtime_limit();
> }
>
> - return mig_stats.dirty_bytes_last_sync /
> - migration_get_switchover_bw(s) * 1000;
> + bw = migration_get_switchover_bw(s) * 1000;
> + if (!bw) {
> + return INT64_MAX;
> + }
> +
> + return mig_stats.dirty_bytes_last_sync / bw;
I think this is still UB if bw is a value that's nonzero
but still small enough that mig_stats.dirty_bytes_last_sync / bw
is greater than INT64_MAX. It might be more robust to do
something like:
double expected = mig_stats.dirty_bytes_last_sync /
migration_get_switchover_bw(s) * 1000;
if (expected >= (double)INT64_MAX) {
return INT64_MAX;
}
return (int64_t)expected;
(which will also handle the infinity case, because
in IEEE fp infinity is larger than any finite value).
-- PMM
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] migration: Fix possible division by zero on calc expected downtime
2026-05-11 14:26 ` Peter Maydell
@ 2026-05-11 14:51 ` Peter Xu
2026-05-11 15:18 ` Peter Xu
0 siblings, 1 reply; 7+ messages in thread
From: Peter Xu @ 2026-05-11 14:51 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel, Fabiano Rosas, Juraj Marcin
On Mon, May 11, 2026 at 03:26:06PM +0100, Peter Maydell wrote:
> On Mon, 11 May 2026 at 15:10, Peter Xu <peterx@redhat.com> wrote:
> >
> > Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> > that the value will be calculated on-demand. One side effect on the change
> > is QEMU will allow the calculation to happen anytime even if there's no
> > transfer happening for a short while.
> >
> > PeterM reported an ubsan report from clang when running migration-test with
> > aarch64 binary on x86_64 hosts. I can also reproduce if I run the test
> > concurrently so some of the src QEMU may not get chance to push any data,
> > causing mbps to be 0:
> >
> > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
> >
> > Fix it by checking mbps non-zero before the devision.
> >
> > Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
> > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > migration/migration.c | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index b6f78eb3ac..b06e577328 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1044,12 +1044,18 @@ static bool migrate_show_downtime(MigrationState *s)
> > /* Return expected downtime (unit: milliseconds) */
> > int64_t migration_downtime_calc_expected(MigrationState *s)
> > {
> > + double bw;
> > +
> > if (mig_stats.dirty_sync_count <= 1) {
> > return migrate_downtime_limit();
> > }
> >
> > - return mig_stats.dirty_bytes_last_sync /
> > - migration_get_switchover_bw(s) * 1000;
> > + bw = migration_get_switchover_bw(s) * 1000;
> > + if (!bw) {
> > + return INT64_MAX;
> > + }
> > +
> > + return mig_stats.dirty_bytes_last_sync / bw;
>
> I think this is still UB if bw is a value that's nonzero
> but still small enough that mig_stats.dirty_bytes_last_sync / bw
> is greater than INT64_MAX. It might be more robust to do
> something like:
>
> double expected = mig_stats.dirty_bytes_last_sync /
> migration_get_switchover_bw(s) * 1000;
>
> if (expected >= (double)INT64_MAX) {
> return INT64_MAX;
> }
> return (int64_t)expected;
>
> (which will also handle the infinity case, because
> in IEEE fp infinity is larger than any finite value).
I almost treated mbps as integer, because when it's non-zero it almost
should be at least ~10byte/s (our unit is (uint)transferred / 100ms).. so I
doubt if the issue can happen. But indeed it's theoretically flawed.
I'll repost, thanks for the careful review, Peter.
--
Peter Xu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] migration: Fix possible division by zero on calc expected downtime
2026-05-11 14:51 ` Peter Xu
@ 2026-05-11 15:18 ` Peter Xu
2026-05-11 15:24 ` Peter Maydell
0 siblings, 1 reply; 7+ messages in thread
From: Peter Xu @ 2026-05-11 15:18 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel, Fabiano Rosas, Juraj Marcin
On Mon, May 11, 2026 at 10:51:00AM -0400, Peter Xu wrote:
> On Mon, May 11, 2026 at 03:26:06PM +0100, Peter Maydell wrote:
> > On Mon, 11 May 2026 at 15:10, Peter Xu <peterx@redhat.com> wrote:
> > >
> > > Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> > > that the value will be calculated on-demand. One side effect on the change
> > > is QEMU will allow the calculation to happen anytime even if there's no
> > > transfer happening for a short while.
> > >
> > > PeterM reported an ubsan report from clang when running migration-test with
> > > aarch64 binary on x86_64 hosts. I can also reproduce if I run the test
> > > concurrently so some of the src QEMU may not get chance to push any data,
> > > causing mbps to be 0:
> > >
> > > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
> > >
> > > Fix it by checking mbps non-zero before the devision.
> > >
> > > Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
> > > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > > migration/migration.c | 10 ++++++++--
> > > 1 file changed, 8 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index b6f78eb3ac..b06e577328 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -1044,12 +1044,18 @@ static bool migrate_show_downtime(MigrationState *s)
> > > /* Return expected downtime (unit: milliseconds) */
> > > int64_t migration_downtime_calc_expected(MigrationState *s)
> > > {
> > > + double bw;
> > > +
> > > if (mig_stats.dirty_sync_count <= 1) {
> > > return migrate_downtime_limit();
> > > }
> > >
> > > - return mig_stats.dirty_bytes_last_sync /
> > > - migration_get_switchover_bw(s) * 1000;
> > > + bw = migration_get_switchover_bw(s) * 1000;
PS: I think I got the math wrong, not that it matters anymore.. :(
> > > + if (!bw) {
> > > + return INT64_MAX;
> > > + }
> > > +
> > > + return mig_stats.dirty_bytes_last_sync / bw;
> >
> > I think this is still UB if bw is a value that's nonzero
> > but still small enough that mig_stats.dirty_bytes_last_sync / bw
> > is greater than INT64_MAX. It might be more robust to do
> > something like:
> >
> > double expected = mig_stats.dirty_bytes_last_sync /
> > migration_get_switchover_bw(s) * 1000;
> >
> > if (expected >= (double)INT64_MAX) {
> > return INT64_MAX;
> > }
This actually fails to fix the ubsan issue and I can still reproduce,
because the ubsan issue was a Nan reported, where it seems when Nan found
in normal such evaluations it'll return FALSE. I'll need to use:
if (expected < (double)INT64_MAX) {
return (int64_t)expected;
}
I'll add a rich comment when repost.
> > return (int64_t)expected;
> >
> > (which will also handle the infinity case, because
> > in IEEE fp infinity is larger than any finite value).
>
> I almost treated mbps as integer, because when it's non-zero it almost
> should be at least ~10byte/s (our unit is (uint)transferred / 100ms).. so I
> doubt if the issue can happen. But indeed it's theoretically flawed.
>
> I'll repost, thanks for the careful review, Peter.
>
> --
> Peter Xu
--
Peter Xu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] migration: Fix possible division by zero on calc expected downtime
2026-05-11 15:18 ` Peter Xu
@ 2026-05-11 15:24 ` Peter Maydell
2026-05-11 15:29 ` Peter Xu
0 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2026-05-11 15:24 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas, Juraj Marcin
On Mon, 11 May 2026 at 16:18, Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, May 11, 2026 at 10:51:00AM -0400, Peter Xu wrote:
> > On Mon, May 11, 2026 at 03:26:06PM +0100, Peter Maydell wrote:
> > > On Mon, 11 May 2026 at 15:10, Peter Xu <peterx@redhat.com> wrote:
> > > >
> > > > Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> > > > that the value will be calculated on-demand. One side effect on the change
> > > > is QEMU will allow the calculation to happen anytime even if there's no
> > > > transfer happening for a short while.
> > > >
> > > > PeterM reported an ubsan report from clang when running migration-test with
> > > > aarch64 binary on x86_64 hosts. I can also reproduce if I run the test
> > > > concurrently so some of the src QEMU may not get chance to push any data,
> > > > causing mbps to be 0:
> > > >
> > > > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
> > > >
> > > > Fix it by checking mbps non-zero before the devision.
> > > >
> > > > Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
> > > > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > > > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > > migration/migration.c | 10 ++++++++--
> > > > 1 file changed, 8 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > index b6f78eb3ac..b06e577328 100644
> > > > --- a/migration/migration.c
> > > > +++ b/migration/migration.c
> > > > @@ -1044,12 +1044,18 @@ static bool migrate_show_downtime(MigrationState *s)
> > > > /* Return expected downtime (unit: milliseconds) */
> > > > int64_t migration_downtime_calc_expected(MigrationState *s)
> > > > {
> > > > + double bw;
> > > > +
> > > > if (mig_stats.dirty_sync_count <= 1) {
> > > > return migrate_downtime_limit();
> > > > }
> > > >
> > > > - return mig_stats.dirty_bytes_last_sync /
> > > > - migration_get_switchover_bw(s) * 1000;
> > > > + bw = migration_get_switchover_bw(s) * 1000;
>
> PS: I think I got the math wrong, not that it matters anymore.. :(
>
> > > > + if (!bw) {
> > > > + return INT64_MAX;
> > > > + }
> > > > +
> > > > + return mig_stats.dirty_bytes_last_sync / bw;
> > >
> > > I think this is still UB if bw is a value that's nonzero
> > > but still small enough that mig_stats.dirty_bytes_last_sync / bw
> > > is greater than INT64_MAX. It might be more robust to do
> > > something like:
> > >
> > > double expected = mig_stats.dirty_bytes_last_sync /
> > > migration_get_switchover_bw(s) * 1000;
> > >
> > > if (expected >= (double)INT64_MAX) {
> > > return INT64_MAX;
> > > }
>
> This actually fails to fix the ubsan issue and I can still reproduce,
> because the ubsan issue was a Nan reported, where it seems when Nan found
> in normal such evaluations it'll return FALSE. I'll need to use:
>
> if (expected < (double)INT64_MAX) {
> return (int64_t)expected;
> }
>
Hmm, how do we get a NaN ?
-- PMM
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] migration: Fix possible division by zero on calc expected downtime
2026-05-11 15:24 ` Peter Maydell
@ 2026-05-11 15:29 ` Peter Xu
2026-05-11 15:36 ` Peter Maydell
0 siblings, 1 reply; 7+ messages in thread
From: Peter Xu @ 2026-05-11 15:29 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel, Fabiano Rosas, Juraj Marcin
On Mon, May 11, 2026 at 04:24:24PM +0100, Peter Maydell wrote:
> > This actually fails to fix the ubsan issue and I can still reproduce,
> > because the ubsan issue was a Nan reported, where it seems when Nan found
> > in normal such evaluations it'll return FALSE. I'll need to use:
> >
> > if (expected < (double)INT64_MAX) {
> > return (int64_t)expected;
> > }
> >
>
> Hmm, how do we get a NaN ?
When mbps is zero (when it reproduces here). I have the comment in v2
explaining it.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] migration: Fix possible division by zero on calc expected downtime
2026-05-11 15:29 ` Peter Xu
@ 2026-05-11 15:36 ` Peter Maydell
0 siblings, 0 replies; 7+ messages in thread
From: Peter Maydell @ 2026-05-11 15:36 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas, Juraj Marcin
On Mon, 11 May 2026 at 16:30, Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, May 11, 2026 at 04:24:24PM +0100, Peter Maydell wrote:
> > > This actually fails to fix the ubsan issue and I can still reproduce,
> > > because the ubsan issue was a Nan reported, where it seems when Nan found
> > > in normal such evaluations it'll return FALSE. I'll need to use:
> > >
> > > if (expected < (double)INT64_MAX) {
> > > return (int64_t)expected;
> > > }
> > >
> >
> > Hmm, how do we get a NaN ?
>
> When mbps is zero (when it reproduces here). I have the comment in v2
> explaining it.
Oh, I think I see -- we get the NaN if both s->mbps == 0 and
also mig_stats.dirty_bytes_last_sync == 0, because that's 0 / 0,
which is NaN. If only s->mbps == 0 but dirty_bytes_last_sync is
not zero, then we get +Inf.
thanks
-- PMM
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-05-11 15:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 14:10 [PATCH] migration: Fix possible division by zero on calc expected downtime Peter Xu
2026-05-11 14:26 ` Peter Maydell
2026-05-11 14:51 ` Peter Xu
2026-05-11 15:18 ` Peter Xu
2026-05-11 15:24 ` Peter Maydell
2026-05-11 15:29 ` Peter Xu
2026-05-11 15:36 ` Peter Maydell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.