From: Peter Xu <peterx@redhat.com>
To: "Dr. David Alan Gilbert" <dave@treblig.org>
Cc: qemu-devel@nongnu.org, Alexey Perevalov <a.perevalov@samsung.com>,
Juraj Marcin <jmarcin@redhat.com>,
Fabiano Rosas <farosas@suse.de>,
Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH v2 08/13] migration/postcopy: Report fault latencies in blocktime
Date: Tue, 10 Jun 2025 09:39:20 -0400 [thread overview]
Message-ID: <aEg1iP9iXlYsQP0C@x1.local> (raw)
In-Reply-To: <aEd3d07hQYXWc4eq@gallifrey>
On Tue, Jun 10, 2025 at 12:08:23AM +0000, Dr. David Alan Gilbert wrote:
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 4963f6ca12..e95b7402cb 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -236,6 +236,17 @@
> > # This is only present when the postcopy-blocktime migration
> > # capability is enabled. (Since 3.0)
> > #
> > +# @postcopy-latency: average remote page fault latency (in us). Note that
> > +# this doesn't include all faults, but only the ones that require a
> > +# remote page request. So it should be always bigger than the real
> > +# average page fault latency. This is only present when the
> > +# postcopy-blocktime migration capability is enabled. (Since 10.1)
> > +#
> > +# @postcopy-vcpu-latency: average remote page fault latency per vCPU (in
> > +# us). It has the same definition of @postcopy-latency, but instead
> > +# this is the per-vCPU statistics. This is only present when the
> > +# postcopy-blocktime migration capability is enabled. (Since 10.1)
>
> I wonder if even 'us' is too big; given you have 64bits to play with, and your
> examples show some samples landing in under 10us, perhaps it's best
> to at least define the qapi fields as ns, even if you keep with the same
> buckets for now?
The few <10us ones should pretty much be outliers, I'd expect it happened
because some faulted pages got lucky to be migrated (in the background
stream rather than the preempt stream) right after sending the request.
But it's still a fair point, especially if there's nothing to lose to
switch to nanoseconds here when we have 64bits fields.. I also did a quick
check online, looks like RDMA over 100Gbps NIC may actually do a fast
round-robin transaction within a few microseconds indeed at least with zero
loads..
Let me do the switch in v3.
While at it, when thinking of possible future unit/format changes in the
report, maybe I should also mark all of these fields experimental from the
start? So we don't necessarily need to maintain the ABI - the expectation
is even if a mgmt would like to fetch those they should only fetch and dump
it into log so that human can read later only for debugging purposes.
--
Peter Xu
next prev parent reply other threads:[~2025-06-10 16:51 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-09 19:12 [PATCH v2 00/13] migration/postcopy: Blocktime tracking overhaul Peter Xu
2025-06-09 19:12 ` [PATCH v2 01/13] migration: Add option to set postcopy-blocktime Peter Xu
2025-06-09 19:12 ` [PATCH v2 02/13] migration/postcopy: Push blocktime start/end into page req mutex Peter Xu
2025-06-09 19:12 ` [PATCH v2 03/13] migration/postcopy: Drop all atomic ops in blocktime feature Peter Xu
2025-06-09 19:12 ` [PATCH v2 04/13] migration/postcopy: Make all blocktime vars 64bits Peter Xu
2025-06-09 19:12 ` [PATCH v2 05/13] migration/postcopy: Drop PostcopyBlocktimeContext.start_time Peter Xu
2025-06-09 19:12 ` [PATCH v2 06/13] migration/postcopy: Bring blocktime layer to us level Peter Xu
2025-06-09 19:12 ` [PATCH v2 07/13] migration/postcopy: Add blocktime fault counts per-vcpu Peter Xu
2025-06-09 19:12 ` [PATCH v2 08/13] migration/postcopy: Report fault latencies in blocktime Peter Xu
2025-06-09 22:05 ` Peter Xu
2025-06-09 22:25 ` Peter Xu
2025-06-10 0:08 ` Dr. David Alan Gilbert
2025-06-10 13:39 ` Peter Xu [this message]
2025-06-10 13:53 ` Dr. David Alan Gilbert
2025-06-10 14:08 ` Peter Xu
2025-06-09 19:12 ` [PATCH v2 09/13] migration/postcopy: Initialize blocktime context only until listen Peter Xu
2025-06-09 19:12 ` [PATCH v2 10/13] migration/postcopy: Cache the tid->vcpu mapping for blocktime Peter Xu
2025-06-09 19:12 ` [PATCH v2 11/13] migration/postcopy: Cleanup the total blocktime accounting Peter Xu
2025-06-09 19:12 ` [PATCH v2 12/13] migration/postcopy: Optimize blocktime fault tracking with hashtable Peter Xu
2025-06-09 19:12 ` [PATCH v2 13/13] migration/postcopy: blocktime allows track / report non-vCPU faults Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aEg1iP9iXlYsQP0C@x1.local \
--to=peterx@redhat.com \
--cc=a.perevalov@samsung.com \
--cc=armbru@redhat.com \
--cc=dave@treblig.org \
--cc=farosas@suse.de \
--cc=jmarcin@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.