* [PATCH v2] migration: Fix possible division by zero on calc expected downtime
@ 2026-05-11 15:20 Peter Xu
2026-05-11 15:27 ` Peter Maydell
2026-05-11 15:47 ` Peter Maydell
0 siblings, 2 replies; 6+ messages in thread
From: Peter Xu @ 2026-05-11 15:20 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, Peter Maydell, Juraj Marcin, Fabiano Rosas
Commit dd4fe8844b changed the reporting of expected downtime behavior, so
that the value will be calculated on-demand. One side effect on the change
is QEMU will allow the calculation to happen anytime even if there's no
transfer happening for a short while.
PeterM reported an ubsan report from clang when running migration-test with
aarch64 binary on x86_64 hosts. I can also reproduce if I run the test
concurrently so some of the src QEMU may not get chance to push any data,
causing mbps to be 0:
../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
Fix it by properly handle both Inf and Nan. One note is we can't use
">"/">=" check here otherwise we cannot cover Nan.
Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/migration/migration.c b/migration/migration.c
index b6f78eb3ac..e4103cd3f0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1044,12 +1044,28 @@ static bool migrate_show_downtime(MigrationState *s)
/* Return expected downtime (unit: milliseconds) */
int64_t migration_downtime_calc_expected(MigrationState *s)
{
+ double expected_ms;
+
if (mig_stats.dirty_sync_count <= 1) {
return migrate_downtime_limit();
}
- return mig_stats.dirty_bytes_last_sync /
+ expected_ms = mig_stats.dirty_bytes_last_sync /
migration_get_switchover_bw(s) * 1000;
+
+ /*
+ * This "<" check covers two cases where we want to fallback to
+ * INT64_MAX, the 1st case is obvious, but the 2nd is not:
+ *
+ * (1) when expected_ms is Inf, or anything too big for int64_t
+ * (2) when expected_ms is Nan (division by zero), evaluation of this
+ * if clause will be FALSE
+ */
+ if (expected_ms < (double)INT64_MAX) {
+ return (int64_t) expected_ms;
+ }
+
+ return INT64_MAX;
}
static void populate_time_info(MigrationInfo *info, MigrationState *s)
--
2.53.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] migration: Fix possible division by zero on calc expected downtime
2026-05-11 15:20 [PATCH v2] migration: Fix possible division by zero on calc expected downtime Peter Xu
@ 2026-05-11 15:27 ` Peter Maydell
2026-05-11 15:47 ` Peter Maydell
1 sibling, 0 replies; 6+ messages in thread
From: Peter Maydell @ 2026-05-11 15:27 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Juraj Marcin, Fabiano Rosas
On Mon, 11 May 2026 at 16:20, Peter Xu <peterx@redhat.com> wrote:
>
> Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> that the value will be calculated on-demand. One side effect on the change
> is QEMU will allow the calculation to happen anytime even if there's no
> transfer happening for a short while.
>
> PeterM reported an ubsan report from clang when running migration-test with
> aarch64 binary on x86_64 hosts. I can also reproduce if I run the test
> concurrently so some of the src QEMU may not get chance to push any data,
> causing mbps to be 0:
>
> ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
>
> Fix it by properly handle both Inf and Nan. One note is we can't use
> ">"/">=" check here otherwise we cannot cover Nan.
I don't really understand how we can get a NaN here -- where
is it coming from? The only case I can think of is if s->mbps is
a NaN -- but in that case that probably is causing other problems
elsewhere too, and it would likely be better to do something
where the NaN is first entering the calculations.
thanks
-- PMM
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] migration: Fix possible division by zero on calc expected downtime
2026-05-11 15:20 [PATCH v2] migration: Fix possible division by zero on calc expected downtime Peter Xu
2026-05-11 15:27 ` Peter Maydell
@ 2026-05-11 15:47 ` Peter Maydell
2026-05-11 17:47 ` Peter Xu
1 sibling, 1 reply; 6+ messages in thread
From: Peter Maydell @ 2026-05-11 15:47 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Juraj Marcin, Fabiano Rosas
On Mon, 11 May 2026 at 16:20, Peter Xu <peterx@redhat.com> wrote:
>
> Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> that the value will be calculated on-demand. One side effect on the change
> is QEMU will allow the calculation to happen anytime even if there's no
> transfer happening for a short while.
>
> PeterM reported an ubsan report from clang when running migration-test with
> aarch64 binary on x86_64 hosts. I can also reproduce if I run the test
> concurrently so some of the src QEMU may not get chance to push any data,
> causing mbps to be 0:
>
> ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
>
> Fix it by properly handle both Inf and Nan. One note is we can't use
> ">"/">=" check here otherwise we cannot cover Nan.
>
> Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
> Reported-by: Peter Maydell <peter.maydell@linaro.org>
> Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> migration/migration.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index b6f78eb3ac..e4103cd3f0 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1044,12 +1044,28 @@ static bool migrate_show_downtime(MigrationState *s)
> /* Return expected downtime (unit: milliseconds) */
> int64_t migration_downtime_calc_expected(MigrationState *s)
> {
> + double expected_ms;
> +
> if (mig_stats.dirty_sync_count <= 1) {
> return migrate_downtime_limit();
> }
>
> - return mig_stats.dirty_bytes_last_sync /
> + expected_ms = mig_stats.dirty_bytes_last_sync /
> migration_get_switchover_bw(s) * 1000;
> +
> + /*
> + * This "<" check covers two cases where we want to fallback to
> + * INT64_MAX, the 1st case is obvious, but the 2nd is not:
> + *
> + * (1) when expected_ms is Inf, or anything too big for int64_t
> + * (2) when expected_ms is Nan (division by zero), evaluation of this
This should say "zero divided by zero" -- general division by
zero gives Inf, and it's only 0 / 0 that runs into NaN.
> + * if clause will be FALSE
> + */
> + if (expected_ms < (double)INT64_MAX) {
This works, but maybe we should write it out
if (isnan(expected_ms) || expected_ms < (double)INT64_MAX) {
rather than being clever and then needing to write a big comment
to explain it?
But if you'd rather just adjust the comment, your patch's code
change does do the right thing.
-- PMM
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] migration: Fix possible division by zero on calc expected downtime
2026-05-11 15:47 ` Peter Maydell
@ 2026-05-11 17:47 ` Peter Xu
2026-05-11 18:03 ` Peter Maydell
0 siblings, 1 reply; 6+ messages in thread
From: Peter Xu @ 2026-05-11 17:47 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel, Juraj Marcin, Fabiano Rosas
On Mon, May 11, 2026 at 04:47:22PM +0100, Peter Maydell wrote:
> On Mon, 11 May 2026 at 16:20, Peter Xu <peterx@redhat.com> wrote:
> >
> > Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> > that the value will be calculated on-demand. One side effect on the change
> > is QEMU will allow the calculation to happen anytime even if there's no
> > transfer happening for a short while.
> >
> > PeterM reported an ubsan report from clang when running migration-test with
> > aarch64 binary on x86_64 hosts. I can also reproduce if I run the test
> > concurrently so some of the src QEMU may not get chance to push any data,
> > causing mbps to be 0:
> >
> > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
> >
> > Fix it by properly handle both Inf and Nan. One note is we can't use
> > ">"/">=" check here otherwise we cannot cover Nan.
> >
> > Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
> > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > migration/migration.c | 18 +++++++++++++++++-
> > 1 file changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index b6f78eb3ac..e4103cd3f0 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1044,12 +1044,28 @@ static bool migrate_show_downtime(MigrationState *s)
> > /* Return expected downtime (unit: milliseconds) */
> > int64_t migration_downtime_calc_expected(MigrationState *s)
> > {
> > + double expected_ms;
> > +
> > if (mig_stats.dirty_sync_count <= 1) {
> > return migrate_downtime_limit();
> > }
> >
> > - return mig_stats.dirty_bytes_last_sync /
> > + expected_ms = mig_stats.dirty_bytes_last_sync /
> > migration_get_switchover_bw(s) * 1000;
> > +
> > + /*
> > + * This "<" check covers two cases where we want to fallback to
> > + * INT64_MAX, the 1st case is obvious, but the 2nd is not:
> > + *
> > + * (1) when expected_ms is Inf, or anything too big for int64_t
> > + * (2) when expected_ms is Nan (division by zero), evaluation of this
>
> This should say "zero divided by zero" -- general division by
> zero gives Inf, and it's only 0 / 0 that runs into NaN.
>
> > + * if clause will be FALSE
> > + */
> > + if (expected_ms < (double)INT64_MAX) {
>
> This works, but maybe we should write it out
> if (isnan(expected_ms) || expected_ms < (double)INT64_MAX) {
I agree using isnan() is better than comment. Though code in the patch for
the next line here is:
+ return (int64_t) expected_ms;
Do you think below should work?
expected_ms = ...;
/* For isnan() (0/0) case, we can return anything; return MAX too */
if (isnan(expected_ms) || expected_ms >= (double)INT64_MAX) {
return INT64_MAX;
}
return (int64_t) expected_ms;
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] migration: Fix possible division by zero on calc expected downtime
2026-05-11 17:47 ` Peter Xu
@ 2026-05-11 18:03 ` Peter Maydell
2026-05-11 18:21 ` Peter Xu
0 siblings, 1 reply; 6+ messages in thread
From: Peter Maydell @ 2026-05-11 18:03 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Juraj Marcin, Fabiano Rosas
On Mon, 11 May 2026 at 18:47, Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, May 11, 2026 at 04:47:22PM +0100, Peter Maydell wrote:
> > On Mon, 11 May 2026 at 16:20, Peter Xu <peterx@redhat.com> wrote:
> > >
> > > Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> > > that the value will be calculated on-demand. One side effect on the change
> > > is QEMU will allow the calculation to happen anytime even if there's no
> > > transfer happening for a short while.
> > >
> > > PeterM reported an ubsan report from clang when running migration-test with
> > > aarch64 binary on x86_64 hosts. I can also reproduce if I run the test
> > > concurrently so some of the src QEMU may not get chance to push any data,
> > > causing mbps to be 0:
> > >
> > > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
> > >
> > > Fix it by properly handle both Inf and Nan. One note is we can't use
> > > ">"/">=" check here otherwise we cannot cover Nan.
> > >
> > > Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
> > > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > > migration/migration.c | 18 +++++++++++++++++-
> > > 1 file changed, 17 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index b6f78eb3ac..e4103cd3f0 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -1044,12 +1044,28 @@ static bool migrate_show_downtime(MigrationState *s)
> > > /* Return expected downtime (unit: milliseconds) */
> > > int64_t migration_downtime_calc_expected(MigrationState *s)
> > > {
> > > + double expected_ms;
> > > +
> > > if (mig_stats.dirty_sync_count <= 1) {
> > > return migrate_downtime_limit();
> > > }
> > >
> > > - return mig_stats.dirty_bytes_last_sync /
> > > + expected_ms = mig_stats.dirty_bytes_last_sync /
> > > migration_get_switchover_bw(s) * 1000;
> > > +
> > > + /*
> > > + * This "<" check covers two cases where we want to fallback to
> > > + * INT64_MAX, the 1st case is obvious, but the 2nd is not:
> > > + *
> > > + * (1) when expected_ms is Inf, or anything too big for int64_t
> > > + * (2) when expected_ms is Nan (division by zero), evaluation of this
> >
> > This should say "zero divided by zero" -- general division by
> > zero gives Inf, and it's only 0 / 0 that runs into NaN.
> >
> > > + * if clause will be FALSE
> > > + */
> > > + if (expected_ms < (double)INT64_MAX) {
> >
> > This works, but maybe we should write it out
> > if (isnan(expected_ms) || expected_ms < (double)INT64_MAX) {
>
> I agree using isnan() is better than comment. Though code in the patch for
> the next line here is:
>
> + return (int64_t) expected_ms;
Oops, yes, I got the sense of the condition wrong.
> Do you think below should work?
>
> expected_ms = ...;
>
> /* For isnan() (0/0) case, we can return anything; return MAX too */
> if (isnan(expected_ms) || expected_ms >= (double)INT64_MAX) {
> return INT64_MAX;
> }
Yes, this will work. But I think rather than "return anything"
we ought to say why what we're returning is a sensible value
for the use case we have. How about:
/*
* If we haven't been able to transfer any data, the result here
* could be NaN (for 0 / 0) or infinity (something else / 0).
* Return INT64_MAX as our best approximation to "this will
* take forever to complete". If the problem is transient
* (e.g. we just haven't started to transfer yet) we'll
* recalculate to a more accurate figure later.
*/
?
-- PMM
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] migration: Fix possible division by zero on calc expected downtime
2026-05-11 18:03 ` Peter Maydell
@ 2026-05-11 18:21 ` Peter Xu
0 siblings, 0 replies; 6+ messages in thread
From: Peter Xu @ 2026-05-11 18:21 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel, Juraj Marcin, Fabiano Rosas
On Mon, May 11, 2026 at 07:03:08PM +0100, Peter Maydell wrote:
> On Mon, 11 May 2026 at 18:47, Peter Xu <peterx@redhat.com> wrote:
> >
> > On Mon, May 11, 2026 at 04:47:22PM +0100, Peter Maydell wrote:
> > > On Mon, 11 May 2026 at 16:20, Peter Xu <peterx@redhat.com> wrote:
> > > >
> > > > Commit dd4fe8844b changed the reporting of expected downtime behavior, so
> > > > that the value will be calculated on-demand. One side effect on the change
> > > > is QEMU will allow the calculation to happen anytime even if there's no
> > > > transfer happening for a short while.
> > > >
> > > > PeterM reported an ubsan report from clang when running migration-test with
> > > > aarch64 binary on x86_64 hosts. I can also reproduce if I run the test
> > > > concurrently so some of the src QEMU may not get chance to push any data,
> > > > causing mbps to be 0:
> > > >
> > > > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
> > > >
> > > > Fix it by properly handle both Inf and Nan. One note is we can't use
> > > > ">"/">=" check here otherwise we cannot cover Nan.
> > > >
> > > > Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
> > > > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > > > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > > migration/migration.c | 18 +++++++++++++++++-
> > > > 1 file changed, 17 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > index b6f78eb3ac..e4103cd3f0 100644
> > > > --- a/migration/migration.c
> > > > +++ b/migration/migration.c
> > > > @@ -1044,12 +1044,28 @@ static bool migrate_show_downtime(MigrationState *s)
> > > > /* Return expected downtime (unit: milliseconds) */
> > > > int64_t migration_downtime_calc_expected(MigrationState *s)
> > > > {
> > > > + double expected_ms;
> > > > +
> > > > if (mig_stats.dirty_sync_count <= 1) {
> > > > return migrate_downtime_limit();
> > > > }
> > > >
> > > > - return mig_stats.dirty_bytes_last_sync /
> > > > + expected_ms = mig_stats.dirty_bytes_last_sync /
> > > > migration_get_switchover_bw(s) * 1000;
> > > > +
> > > > + /*
> > > > + * This "<" check covers two cases where we want to fallback to
> > > > + * INT64_MAX, the 1st case is obvious, but the 2nd is not:
> > > > + *
> > > > + * (1) when expected_ms is Inf, or anything too big for int64_t
> > > > + * (2) when expected_ms is Nan (division by zero), evaluation of this
> > >
> > > This should say "zero divided by zero" -- general division by
> > > zero gives Inf, and it's only 0 / 0 that runs into NaN.
> > >
> > > > + * if clause will be FALSE
> > > > + */
> > > > + if (expected_ms < (double)INT64_MAX) {
> > >
> > > This works, but maybe we should write it out
> > > if (isnan(expected_ms) || expected_ms < (double)INT64_MAX) {
> >
> > I agree using isnan() is better than comment. Though code in the patch for
> > the next line here is:
> >
> > + return (int64_t) expected_ms;
>
> Oops, yes, I got the sense of the condition wrong.
>
> > Do you think below should work?
> >
> > expected_ms = ...;
> >
> > /* For isnan() (0/0) case, we can return anything; return MAX too */
> > if (isnan(expected_ms) || expected_ms >= (double)INT64_MAX) {
> > return INT64_MAX;
> > }
>
> Yes, this will work. But I think rather than "return anything"
> we ought to say why what we're returning is a sensible value
> for the use case we have. How about:
Just to mention, here when I mentioned "anything", what actually in my mind
is the previous valid value we reported, like before the change of commit
dd4fe8844b5, here we used to have a cache value and only update if we
transferred more than 10k bytes (which itself is a magic value).
But I'm not sure if we need to keep that behavior either..
>
> /*
> * If we haven't been able to transfer any data, the result here
> * could be NaN (for 0 / 0) or infinity (something else / 0).
Theoretically, we can also come to affinity if we sent something small but
the total dirty data is rediculously large, but yeah, I'm OK with this
wording; even if it may not be accurate, it's clear enough to me as a
comment to help reading.
> * Return INT64_MAX as our best approximation to "this will
> * take forever to complete". If the problem is transient
> * (e.g. we just haven't started to transfer yet) we'll
> * recalculate to a more accurate figure later.
> */
>
> ?
I'll use the comment suggested, thanks.
--
Peter Xu
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-05-11 18:22 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 15:20 [PATCH v2] migration: Fix possible division by zero on calc expected downtime Peter Xu
2026-05-11 15:27 ` Peter Maydell
2026-05-11 15:47 ` Peter Maydell
2026-05-11 17:47 ` Peter Xu
2026-05-11 18:03 ` Peter Maydell
2026-05-11 18:21 ` Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox