All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: qemu-devel@nongnu.org,
	"Dr . David Alan Gilbert" <dave@treblig.org>,
	Alexey Perevalov <a.perevalov@samsung.com>,
	Fabiano Rosas <farosas@suse.de>,
	Juraj Marcin <jmarcin@redhat.com>
Subject: Re: [PATCH 08/13] migration/postcopy: Report fault latencies in blocktime
Date: Mon, 2 Jun 2025 12:29:18 -0400	[thread overview]
Message-ID: <aD3RXsco8yR2mDV2@x1.local> (raw)
In-Reply-To: <8734cilcj7.fsf@pond.sub.org>

On Mon, Jun 02, 2025 at 11:26:36AM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > Blocktime so far only cares about the time one vcpu (or the whole system)
> > got blocked.  It would be also be helpful if it can also report the latency
> > of page requests, which could be very sensitive during postcopy.
> >
> > Blocktime itself is sometimes not very important, especially when one
> > thinks about KVM async PF support, which means vCPUs are literally almost
> > not blocked at all because the guest OS is smart enough to switch to
> > another task when a remote fault is needed.
> >
> > However, latency is still sensitive and important because even if the guest
> > vCPU is running on threads that do not need a remote fault, the workload
> > that accesses some missing page is still affected.
> >
> > Add two entries to the report, showing how long it takes to resolve a
> > remote fault.  Mention in the QAPI doc that this is not the real average
> > fault latency, but only the ones that was requested for a remote fault.
> >
> > Unwrap get_vcpu_blocktime_list() so we don't need to walk the list twice,
> > meanwhile add the entry checks in qtests for all postcopy tests.
> >
> > Cc: Markus Armbruster <armbru@redhat.com>
> > Cc: Dr. David Alan Gilbert <dave@treblig.org>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  qapi/migration.json                   | 13 +++++
> >  migration/migration-hmp-cmds.c        | 70 ++++++++++++++++++---------
> >  migration/postcopy-ram.c              | 48 ++++++++++++------
> >  tests/qtest/migration/migration-qmp.c |  3 ++
> >  4 files changed, 97 insertions(+), 37 deletions(-)
> >
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 8b9c53595c..8b13cea169 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -236,6 +236,17 @@
> >  #     This is only present when the postcopy-blocktime migration
> >  #     capability is enabled.  (Since 3.0)
> >  #
> > +# @postcopy-latency: average remote page fault latency (in us).  Note that
> > +#     this doesn't include all faults, but only the ones that require a
> > +#     remote page request.  So it should be always bigger than the real
> > +#     average page fault latency. This is only present when the
> > +#     postcopy-blocktime migration capability is enabled.  (Since 10.1)
> > +#
> > +# @postcopy-vcpu-latency: average remote page fault latency per vCPU (in
> > +#     us).  It has the same definition of @postcopy-latency, but instead
> > +#     this is the per-vCPU statistics. This is only present when the
> 
> Two spaces between sentences for consistency, please.

Fixed.  There's another similar occurance in the last patch, I'll fix that
too.

> 
> > +#     postcopy-blocktime migration capability is enabled.  (Since 10.1)
> 
> I figure the the @i-th array element is for vCPU with index @i.  Correct?
> 
> This is also only present when @postcopy-blocktime is enabled.  Correct?

Correct on both.

> 
> Could a QMP client compute @postcopy-latency from
> @postcopy-vcpu-latency?

Not with the current API.

Right now, the reported values are per-vCPU average latencies and global
average latencies, not yet per-vCPU fault counts. Per-vCPU fault counts
will be needed to do the calculation.

I chose to export global average latency only because that should be the
most important one to me as of now.  The per-vCPU results are pretty much
side effect of how blocktime feature does accounting so far (which is based
on per-vCPU), so it's very low hanging fruit.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2025-06-02 16:30 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-27 23:12 [PATCH 00/13] migration/postcopy: Blocktime tracking overhaul Peter Xu
2025-05-27 23:12 ` [PATCH 01/13] migration: Add option to set postcopy-blocktime Peter Xu
2025-06-02 16:50   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 02/13] migration/postcopy: Push blocktime start/end into page req mutex Peter Xu
2025-06-02 17:46   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 03/13] migration/postcopy: Drop all atomic ops in blocktime feature Peter Xu
2025-06-02 18:00   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 04/13] migration/postcopy: Make all blocktime vars 64bits Peter Xu
2025-06-02 20:42   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 05/13] migration/postcopy: Drop PostcopyBlocktimeContext.start_time Peter Xu
2025-06-03 15:52   ` Fabiano Rosas
2025-06-03 15:55     ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 06/13] migration/postcopy: Bring blocktime layer to us level Peter Xu
2025-06-03 15:54   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 07/13] migration/postcopy: Add blocktime fault counts per-vcpu Peter Xu
2025-06-03 15:59   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 08/13] migration/postcopy: Report fault latencies in blocktime Peter Xu
2025-06-02  9:26   ` Markus Armbruster
2025-06-02 16:29     ` Peter Xu [this message]
2025-06-03 16:07   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 09/13] migration/postcopy: Initialize blocktime context only until listen Peter Xu
2025-06-03 16:08   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 10/13] migration/postcopy: Cache the tid->vcpu mapping for blocktime Peter Xu
2025-06-03 16:20   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 11/13] migration/postcopy: Cleanup the total blocktime accounting Peter Xu
2025-06-03 16:22   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 12/13] migration/postcopy: Optimize blocktime fault tracking with hashtable Peter Xu
2025-06-03 16:44   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 13/13] migration/postcopy: blocktime allows track / report non-vCPU faults Peter Xu
2025-06-03 16:50   ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aD3RXsco8yR2mDV2@x1.local \
    --to=peterx@redhat.com \
    --cc=a.perevalov@samsung.com \
    --cc=armbru@redhat.com \
    --cc=dave@treblig.org \
    --cc=farosas@suse.de \
    --cc=jmarcin@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.