From: "Dr. David Alan Gilbert" <dave@treblig.org>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Alexey Perevalov <a.perevalov@samsung.com>,
Juraj Marcin <jmarcin@redhat.com>,
Fabiano Rosas <farosas@suse.de>,
Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH v2 08/13] migration/postcopy: Report fault latencies in blocktime
Date: Tue, 10 Jun 2025 13:53:40 +0000 [thread overview]
Message-ID: <aEg45Bptc4QGq5gK@gallifrey> (raw)
In-Reply-To: <aEg1iP9iXlYsQP0C@x1.local>
* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Jun 10, 2025 at 12:08:23AM +0000, Dr. David Alan Gilbert wrote:
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 4963f6ca12..e95b7402cb 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -236,6 +236,17 @@
> > > # This is only present when the postcopy-blocktime migration
> > > # capability is enabled. (Since 3.0)
> > > #
> > > +# @postcopy-latency: average remote page fault latency (in us). Note that
> > > +# this doesn't include all faults, but only the ones that require a
> > > +# remote page request. So it should be always bigger than the real
> > > +# average page fault latency. This is only present when the
> > > +# postcopy-blocktime migration capability is enabled. (Since 10.1)
> > > +#
> > > +# @postcopy-vcpu-latency: average remote page fault latency per vCPU (in
> > > +# us). It has the same definition of @postcopy-latency, but instead
> > > +# this is the per-vCPU statistics. This is only present when the
> > > +# postcopy-blocktime migration capability is enabled. (Since 10.1)
> >
> > I wonder if even 'us' is too big; given you have 64bits to play with, and your
> > examples show some samples landing in under 10us, perhaps it's best
> > to at least define the qapi fields as ns, even if you keep with the same
> > buckets for now?
>
> The few <10us ones should pretty much be outliers, I'd expect it happened
> because some faulted pages got lucky to be migrated (in the background
> stream rather than the preempt stream) right after sending the request.
>
> But it's still a fair point, especially if there's nothing to lose to
> switch to nanoseconds here when we have 64bits fields.. I also did a quick
> check online, looks like RDMA over 100Gbps NIC may actually do a fast
> round-robin transaction within a few microseconds indeed at least with zero
> loads..
>
> Let me do the switch in v3.
>
> While at it, when thinking of possible future unit/format changes in the
> report, maybe I should also mark all of these fields experimental from the
> start? So we don't necessarily need to maintain the ABI - the expectation
> is even if a mgmt would like to fetch those they should only fetch and dump
> it into log so that human can read later only for debugging purposes.
Yeh I think that's OK, although perhaps another way would be to add
a field indicating the time of the first bucket; i.e. you could specify
that all the values are in ns, but have first-bucket=1000 to be exactly
the same as you have it now.
Dave
> --
> Peter Xu
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
next prev parent reply other threads:[~2025-06-10 16:52 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-09 19:12 [PATCH v2 00/13] migration/postcopy: Blocktime tracking overhaul Peter Xu
2025-06-09 19:12 ` [PATCH v2 01/13] migration: Add option to set postcopy-blocktime Peter Xu
2025-06-09 19:12 ` [PATCH v2 02/13] migration/postcopy: Push blocktime start/end into page req mutex Peter Xu
2025-06-09 19:12 ` [PATCH v2 03/13] migration/postcopy: Drop all atomic ops in blocktime feature Peter Xu
2025-06-09 19:12 ` [PATCH v2 04/13] migration/postcopy: Make all blocktime vars 64bits Peter Xu
2025-06-09 19:12 ` [PATCH v2 05/13] migration/postcopy: Drop PostcopyBlocktimeContext.start_time Peter Xu
2025-06-09 19:12 ` [PATCH v2 06/13] migration/postcopy: Bring blocktime layer to us level Peter Xu
2025-06-09 19:12 ` [PATCH v2 07/13] migration/postcopy: Add blocktime fault counts per-vcpu Peter Xu
2025-06-09 19:12 ` [PATCH v2 08/13] migration/postcopy: Report fault latencies in blocktime Peter Xu
2025-06-09 22:05 ` Peter Xu
2025-06-09 22:25 ` Peter Xu
2025-06-10 0:08 ` Dr. David Alan Gilbert
2025-06-10 13:39 ` Peter Xu
2025-06-10 13:53 ` Dr. David Alan Gilbert [this message]
2025-06-10 14:08 ` Peter Xu
2025-06-09 19:12 ` [PATCH v2 09/13] migration/postcopy: Initialize blocktime context only until listen Peter Xu
2025-06-09 19:12 ` [PATCH v2 10/13] migration/postcopy: Cache the tid->vcpu mapping for blocktime Peter Xu
2025-06-09 19:12 ` [PATCH v2 11/13] migration/postcopy: Cleanup the total blocktime accounting Peter Xu
2025-06-09 19:12 ` [PATCH v2 12/13] migration/postcopy: Optimize blocktime fault tracking with hashtable Peter Xu
2025-06-09 19:12 ` [PATCH v2 13/13] migration/postcopy: blocktime allows track / report non-vCPU faults Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aEg45Bptc4QGq5gK@gallifrey \
--to=dave@treblig.org \
--cc=a.perevalov@samsung.com \
--cc=armbru@redhat.com \
--cc=farosas@suse.de \
--cc=jmarcin@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.