* [PATCH v2] migration: Fix possible division by zero on calc expected downtime
@ 2026-05-11 15:20 Peter Xu
2026-05-11 15:27 ` Peter Maydell
2026-05-11 15:47 ` Peter Maydell
0 siblings, 2 replies; 6+ messages in thread
From: Peter Xu @ 2026-05-11 15:20 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, Peter Maydell, Juraj Marcin, Fabiano Rosas
Commit dd4fe8844b changed the reporting of expected downtime behavior, so
that the value will be calculated on-demand. One side effect on the change
is QEMU will allow the calculation to happen anytime even if there's no
transfer happening for a short while.
PeterM reported an ubsan report from clang when running migration-test with
aarch64 binary on x86_64 hosts. I can also reproduce if I run the test
concurrently so some of the src QEMU may not get chance to push any data,
causing mbps to be 0:
../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long'
Fix it by properly handle both Inf and Nan. One note is we can't use
">"/">=" check here otherwise we cannot cover Nan.
Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/migration/migration.c b/migration/migration.c
index b6f78eb3ac..e4103cd3f0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1044,12 +1044,28 @@ static bool migrate_show_downtime(MigrationState *s)
/* Return expected downtime (unit: milliseconds) */
int64_t migration_downtime_calc_expected(MigrationState *s)
{
+ double expected_ms;
+
if (mig_stats.dirty_sync_count <= 1) {
return migrate_downtime_limit();
}
- return mig_stats.dirty_bytes_last_sync /
+ expected_ms = mig_stats.dirty_bytes_last_sync /
migration_get_switchover_bw(s) * 1000;
+
+ /*
+ * This "<" check covers two cases where we want to fallback to
+ * INT64_MAX, the 1st case is obvious, but the 2nd is not:
+ *
+ * (1) when expected_ms is Inf, or anything too big for int64_t
+ * (2) when expected_ms is Nan (division by zero), evaluation of this
+ * if clause will be FALSE
+ */
+ if (expected_ms < (double)INT64_MAX) {
+ return (int64_t) expected_ms;
+ }
+
+ return INT64_MAX;
}
static void populate_time_info(MigrationInfo *info, MigrationState *s)
--
2.53.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH v2] migration: Fix possible division by zero on calc expected downtime 2026-05-11 15:20 [PATCH v2] migration: Fix possible division by zero on calc expected downtime Peter Xu @ 2026-05-11 15:27 ` Peter Maydell 2026-05-11 15:47 ` Peter Maydell 1 sibling, 0 replies; 6+ messages in thread From: Peter Maydell @ 2026-05-11 15:27 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, Juraj Marcin, Fabiano Rosas On Mon, 11 May 2026 at 16:20, Peter Xu <peterx@redhat.com> wrote: > > Commit dd4fe8844b changed the reporting of expected downtime behavior, so > that the value will be calculated on-demand. One side effect on the change > is QEMU will allow the calculation to happen anytime even if there's no > transfer happening for a short while. > > PeterM reported an ubsan report from clang when running migration-test with > aarch64 binary on x86_64 hosts. I can also reproduce if I run the test > concurrently so some of the src QEMU may not get chance to push any data, > causing mbps to be 0: > > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long' > > Fix it by properly handle both Inf and Nan. One note is we can't use > ">"/">=" check here otherwise we cannot cover Nan. I don't really understand how we can get a NaN here -- where is it coming from? The only case I can think of is if s->mbps is a NaN -- but in that case that probably is causing other problems elsewhere too, and it would likely be better to do something where the NaN is first entering the calculations. thanks -- PMM ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] migration: Fix possible division by zero on calc expected downtime 2026-05-11 15:20 [PATCH v2] migration: Fix possible division by zero on calc expected downtime Peter Xu 2026-05-11 15:27 ` Peter Maydell @ 2026-05-11 15:47 ` Peter Maydell 2026-05-11 17:47 ` Peter Xu 1 sibling, 1 reply; 6+ messages in thread From: Peter Maydell @ 2026-05-11 15:47 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, Juraj Marcin, Fabiano Rosas On Mon, 11 May 2026 at 16:20, Peter Xu <peterx@redhat.com> wrote: > > Commit dd4fe8844b changed the reporting of expected downtime behavior, so > that the value will be calculated on-demand. One side effect on the change > is QEMU will allow the calculation to happen anytime even if there's no > transfer happening for a short while. > > PeterM reported an ubsan report from clang when running migration-test with > aarch64 binary on x86_64 hosts. I can also reproduce if I run the test > concurrently so some of the src QEMU may not get chance to push any data, > causing mbps to be 0: > > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long' > > Fix it by properly handle both Inf and Nan. One note is we can't use > ">"/">=" check here otherwise we cannot cover Nan. > > Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com > Reported-by: Peter Maydell <peter.maydell@linaro.org> > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand") > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > migration/migration.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index b6f78eb3ac..e4103cd3f0 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1044,12 +1044,28 @@ static bool migrate_show_downtime(MigrationState *s) > /* Return expected downtime (unit: milliseconds) */ > int64_t migration_downtime_calc_expected(MigrationState *s) > { > + double expected_ms; > + > if (mig_stats.dirty_sync_count <= 1) { > return migrate_downtime_limit(); > } > > - return mig_stats.dirty_bytes_last_sync / > + expected_ms = mig_stats.dirty_bytes_last_sync / > migration_get_switchover_bw(s) * 1000; > + > + /* > + * This "<" check covers two cases where we want to fallback to > + * INT64_MAX, the 1st case is obvious, but the 2nd is not: > + * > + * (1) when expected_ms is Inf, or anything too big for int64_t > + * (2) when expected_ms is Nan (division by zero), evaluation of this This should say "zero divided by zero" -- general division by zero gives Inf, and it's only 0 / 0 that runs into NaN. > + * if clause will be FALSE > + */ > + if (expected_ms < (double)INT64_MAX) { This works, but maybe we should write it out if (isnan(expected_ms) || expected_ms < (double)INT64_MAX) { rather than being clever and then needing to write a big comment to explain it? But if you'd rather just adjust the comment, your patch's code change does do the right thing. -- PMM ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] migration: Fix possible division by zero on calc expected downtime 2026-05-11 15:47 ` Peter Maydell @ 2026-05-11 17:47 ` Peter Xu 2026-05-11 18:03 ` Peter Maydell 0 siblings, 1 reply; 6+ messages in thread From: Peter Xu @ 2026-05-11 17:47 UTC (permalink / raw) To: Peter Maydell; +Cc: qemu-devel, Juraj Marcin, Fabiano Rosas On Mon, May 11, 2026 at 04:47:22PM +0100, Peter Maydell wrote: > On Mon, 11 May 2026 at 16:20, Peter Xu <peterx@redhat.com> wrote: > > > > Commit dd4fe8844b changed the reporting of expected downtime behavior, so > > that the value will be calculated on-demand. One side effect on the change > > is QEMU will allow the calculation to happen anytime even if there's no > > transfer happening for a short while. > > > > PeterM reported an ubsan report from clang when running migration-test with > > aarch64 binary on x86_64 hosts. I can also reproduce if I run the test > > concurrently so some of the src QEMU may not get chance to push any data, > > causing mbps to be 0: > > > > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long' > > > > Fix it by properly handle both Inf and Nan. One note is we can't use > > ">"/">=" check here otherwise we cannot cover Nan. > > > > Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com > > Reported-by: Peter Maydell <peter.maydell@linaro.org> > > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand") > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > migration/migration.c | 18 +++++++++++++++++- > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > diff --git a/migration/migration.c b/migration/migration.c > > index b6f78eb3ac..e4103cd3f0 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -1044,12 +1044,28 @@ static bool migrate_show_downtime(MigrationState *s) > > /* Return expected downtime (unit: milliseconds) */ > > int64_t migration_downtime_calc_expected(MigrationState *s) > > { > > + double expected_ms; > > + > > if (mig_stats.dirty_sync_count <= 1) { > > return migrate_downtime_limit(); > > } > > > > - return mig_stats.dirty_bytes_last_sync / > > + expected_ms = mig_stats.dirty_bytes_last_sync / > > migration_get_switchover_bw(s) * 1000; > > + > > + /* > > + * This "<" check covers two cases where we want to fallback to > > + * INT64_MAX, the 1st case is obvious, but the 2nd is not: > > + * > > + * (1) when expected_ms is Inf, or anything too big for int64_t > > + * (2) when expected_ms is Nan (division by zero), evaluation of this > > This should say "zero divided by zero" -- general division by > zero gives Inf, and it's only 0 / 0 that runs into NaN. > > > + * if clause will be FALSE > > + */ > > + if (expected_ms < (double)INT64_MAX) { > > This works, but maybe we should write it out > if (isnan(expected_ms) || expected_ms < (double)INT64_MAX) { I agree using isnan() is better than comment. Though code in the patch for the next line here is: + return (int64_t) expected_ms; Do you think below should work? expected_ms = ...; /* For isnan() (0/0) case, we can return anything; return MAX too */ if (isnan(expected_ms) || expected_ms >= (double)INT64_MAX) { return INT64_MAX; } return (int64_t) expected_ms; Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] migration: Fix possible division by zero on calc expected downtime 2026-05-11 17:47 ` Peter Xu @ 2026-05-11 18:03 ` Peter Maydell 2026-05-11 18:21 ` Peter Xu 0 siblings, 1 reply; 6+ messages in thread From: Peter Maydell @ 2026-05-11 18:03 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, Juraj Marcin, Fabiano Rosas On Mon, 11 May 2026 at 18:47, Peter Xu <peterx@redhat.com> wrote: > > On Mon, May 11, 2026 at 04:47:22PM +0100, Peter Maydell wrote: > > On Mon, 11 May 2026 at 16:20, Peter Xu <peterx@redhat.com> wrote: > > > > > > Commit dd4fe8844b changed the reporting of expected downtime behavior, so > > > that the value will be calculated on-demand. One side effect on the change > > > is QEMU will allow the calculation to happen anytime even if there's no > > > transfer happening for a short while. > > > > > > PeterM reported an ubsan report from clang when running migration-test with > > > aarch64 binary on x86_64 hosts. I can also reproduce if I run the test > > > concurrently so some of the src QEMU may not get chance to push any data, > > > causing mbps to be 0: > > > > > > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long' > > > > > > Fix it by properly handle both Inf and Nan. One note is we can't use > > > ">"/">=" check here otherwise we cannot cover Nan. > > > > > > Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com > > > Reported-by: Peter Maydell <peter.maydell@linaro.org> > > > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand") > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > --- > > > migration/migration.c | 18 +++++++++++++++++- > > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > index b6f78eb3ac..e4103cd3f0 100644 > > > --- a/migration/migration.c > > > +++ b/migration/migration.c > > > @@ -1044,12 +1044,28 @@ static bool migrate_show_downtime(MigrationState *s) > > > /* Return expected downtime (unit: milliseconds) */ > > > int64_t migration_downtime_calc_expected(MigrationState *s) > > > { > > > + double expected_ms; > > > + > > > if (mig_stats.dirty_sync_count <= 1) { > > > return migrate_downtime_limit(); > > > } > > > > > > - return mig_stats.dirty_bytes_last_sync / > > > + expected_ms = mig_stats.dirty_bytes_last_sync / > > > migration_get_switchover_bw(s) * 1000; > > > + > > > + /* > > > + * This "<" check covers two cases where we want to fallback to > > > + * INT64_MAX, the 1st case is obvious, but the 2nd is not: > > > + * > > > + * (1) when expected_ms is Inf, or anything too big for int64_t > > > + * (2) when expected_ms is Nan (division by zero), evaluation of this > > > > This should say "zero divided by zero" -- general division by > > zero gives Inf, and it's only 0 / 0 that runs into NaN. > > > > > + * if clause will be FALSE > > > + */ > > > + if (expected_ms < (double)INT64_MAX) { > > > > This works, but maybe we should write it out > > if (isnan(expected_ms) || expected_ms < (double)INT64_MAX) { > > I agree using isnan() is better than comment. Though code in the patch for > the next line here is: > > + return (int64_t) expected_ms; Oops, yes, I got the sense of the condition wrong. > Do you think below should work? > > expected_ms = ...; > > /* For isnan() (0/0) case, we can return anything; return MAX too */ > if (isnan(expected_ms) || expected_ms >= (double)INT64_MAX) { > return INT64_MAX; > } Yes, this will work. But I think rather than "return anything" we ought to say why what we're returning is a sensible value for the use case we have. How about: /* * If we haven't been able to transfer any data, the result here * could be NaN (for 0 / 0) or infinity (something else / 0). * Return INT64_MAX as our best approximation to "this will * take forever to complete". If the problem is transient * (e.g. we just haven't started to transfer yet) we'll * recalculate to a more accurate figure later. */ ? -- PMM ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] migration: Fix possible division by zero on calc expected downtime 2026-05-11 18:03 ` Peter Maydell @ 2026-05-11 18:21 ` Peter Xu 0 siblings, 0 replies; 6+ messages in thread From: Peter Xu @ 2026-05-11 18:21 UTC (permalink / raw) To: Peter Maydell; +Cc: qemu-devel, Juraj Marcin, Fabiano Rosas On Mon, May 11, 2026 at 07:03:08PM +0100, Peter Maydell wrote: > On Mon, 11 May 2026 at 18:47, Peter Xu <peterx@redhat.com> wrote: > > > > On Mon, May 11, 2026 at 04:47:22PM +0100, Peter Maydell wrote: > > > On Mon, 11 May 2026 at 16:20, Peter Xu <peterx@redhat.com> wrote: > > > > > > > > Commit dd4fe8844b changed the reporting of expected downtime behavior, so > > > > that the value will be calculated on-demand. One side effect on the change > > > > is QEMU will allow the calculation to happen anytime even if there's no > > > > transfer happening for a short while. > > > > > > > > PeterM reported an ubsan report from clang when running migration-test with > > > > aarch64 binary on x86_64 hosts. I can also reproduce if I run the test > > > > concurrently so some of the src QEMU may not get chance to push any data, > > > > causing mbps to be 0: > > > > > > > > ../migration/migration.c:1051:12: runtime error: -nan is outside the range of representable values of type 'long' > > > > > > > > Fix it by properly handle both Inf and Nan. One note is we can't use > > > > ">"/">=" check here otherwise we cannot cover Nan. > > > > > > > > Link: https://lore.kernel.org/r/CAFEAcA-MYH6C39xO0OLx4-M5pKurJpurwRsMqZe9q=W-NShAbw@mail.gmail.com > > > > Reported-by: Peter Maydell <peter.maydell@linaro.org> > > > > Fixes: dd4fe8844b ("migration: Calculate expected downtime on demand") > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > > --- > > > > migration/migration.c | 18 +++++++++++++++++- > > > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > > index b6f78eb3ac..e4103cd3f0 100644 > > > > --- a/migration/migration.c > > > > +++ b/migration/migration.c > > > > @@ -1044,12 +1044,28 @@ static bool migrate_show_downtime(MigrationState *s) > > > > /* Return expected downtime (unit: milliseconds) */ > > > > int64_t migration_downtime_calc_expected(MigrationState *s) > > > > { > > > > + double expected_ms; > > > > + > > > > if (mig_stats.dirty_sync_count <= 1) { > > > > return migrate_downtime_limit(); > > > > } > > > > > > > > - return mig_stats.dirty_bytes_last_sync / > > > > + expected_ms = mig_stats.dirty_bytes_last_sync / > > > > migration_get_switchover_bw(s) * 1000; > > > > + > > > > + /* > > > > + * This "<" check covers two cases where we want to fallback to > > > > + * INT64_MAX, the 1st case is obvious, but the 2nd is not: > > > > + * > > > > + * (1) when expected_ms is Inf, or anything too big for int64_t > > > > + * (2) when expected_ms is Nan (division by zero), evaluation of this > > > > > > This should say "zero divided by zero" -- general division by > > > zero gives Inf, and it's only 0 / 0 that runs into NaN. > > > > > > > + * if clause will be FALSE > > > > + */ > > > > + if (expected_ms < (double)INT64_MAX) { > > > > > > This works, but maybe we should write it out > > > if (isnan(expected_ms) || expected_ms < (double)INT64_MAX) { > > > > I agree using isnan() is better than comment. Though code in the patch for > > the next line here is: > > > > + return (int64_t) expected_ms; > > Oops, yes, I got the sense of the condition wrong. > > > Do you think below should work? > > > > expected_ms = ...; > > > > /* For isnan() (0/0) case, we can return anything; return MAX too */ > > if (isnan(expected_ms) || expected_ms >= (double)INT64_MAX) { > > return INT64_MAX; > > } > > Yes, this will work. But I think rather than "return anything" > we ought to say why what we're returning is a sensible value > for the use case we have. How about: Just to mention, here when I mentioned "anything", what actually in my mind is the previous valid value we reported, like before the change of commit dd4fe8844b5, here we used to have a cache value and only update if we transferred more than 10k bytes (which itself is a magic value). But I'm not sure if we need to keep that behavior either.. > > /* > * If we haven't been able to transfer any data, the result here > * could be NaN (for 0 / 0) or infinity (something else / 0). Theoretically, we can also come to affinity if we sent something small but the total dirty data is rediculously large, but yeah, I'm OK with this wording; even if it may not be accurate, it's clear enough to me as a comment to help reading. > * Return INT64_MAX as our best approximation to "this will > * take forever to complete". If the problem is transient > * (e.g. we just haven't started to transfer yet) we'll > * recalculate to a more accurate figure later. > */ > > ? I'll use the comment suggested, thanks. -- Peter Xu ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-05-11 18:22 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-11 15:20 [PATCH v2] migration: Fix possible division by zero on calc expected downtime Peter Xu 2026-05-11 15:27 ` Peter Maydell 2026-05-11 15:47 ` Peter Maydell 2026-05-11 17:47 ` Peter Xu 2026-05-11 18:03 ` Peter Maydell 2026-05-11 18:21 ` Peter Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox