Re: [RFC PATCH 0/3] kvm,sched: Add gtime halted

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Fernand Sieber <sieberf@amazon.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	 Vincent Guittot <vincent.guittot@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	x86@kernel.org,  kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, nh-open-source@amazon.com
Subject: Re: [RFC PATCH 0/3] kvm,sched: Add gtime halted
Date: Tue, 25 Feb 2025 18:17:19 -0800	[thread overview]
Message-ID: <Z755r4S_7BLbHlWa@google.com> (raw)
In-Reply-To: <20250218202618.567363-1-sieberf@amazon.com>

On Tue, Feb 18, 2025, Fernand Sieber wrote:
> With guest hlt, pause and mwait pass through, the hypervisor loses
> visibility on real guest cpu activity. From the point of view of the
> host, such vcpus are always 100% active even when the guest is
> completely halted.
> 
> Typically hlt, pause and mwait pass through is only implemented on
> non-timeshared pcpus. However, there are cases where this assumption
> cannot be strictly met as some occasional housekeeping work needs to be

What housekeeping work?

> scheduled on such cpus while we generally want to preserve the pass
> through performance gains. This applies for system which don't have
> dedicated cpus for housekeeping purposes.
> 
> In such cases, the lack of visibility of the hypervisor is problematic
> from a load balancing point of view. In the absence of a better signal,
> it will preemt vcpus at random. For example it could decide to interrupt
> a vcpu doing critical idle poll work while another vcpu sits idle.
> 
> Another motivation for gaining visibility into real guest cpu activity
> is to enable the hypervisor to vend metrics about it for external
> consumption.

Such as?

> In this RFC we introduce the concept of guest halted time to address
> these concerns. Guest halted time (gtime_halted) accounts for cycles
> spent in guest mode while the cpu is halted. gtime_halted relies on
> measuring the mperf msr register (x86) around VM enter/exits to compute
> the number of unhalted cycles; halted cycles are then derived from the
> tsc difference minus the mperf difference.

IMO, there are better ways to solve this than having KVM sample MPERF on every
entry and exit.

The kernel already samples APERF/MPREF on every tick and provides that information
via /proc/cpuinfo, just use that.  If your userspace is unable to use /proc/cpuinfo
or similar, that needs to be explained.

And if you're running vCPUs on tickless CPUs, and you're doing HLT/MWAIT passthrough,
*and* you want to schedule other tasks on those CPUs, then IMO you're abusing all
of those things and it's not KVM's problem to solve, especially now that sched_ext
is a thing.

> gtime_halted is exposed to proc/<pid>/stat as a new entry, which enables
> users to monitor real guest activity.
> 
> gtime_halted is also plumbed to the scheduler infrastructure to discount
> halted cycles from fair load accounting. This enlightens the load
> balancer to real guest activity for better task placement.
> 
> This initial RFC has a few limitations and open questions:
> * only the x86 infrastructure is supported as it relies on architecture
>   dependent registers. Future development will extend this to ARM.
> * we assume that mperf accumulates as the same rate as tsc. While I am
>   not certain whether this assumption is ever violated, the spec doesn't
>   seem to offer this guarantee [1] so we may want to calibrate mperf.
> * the sched enlightenment logic relies on periodic gtime_halted updates.
>   As such, it is incompatible with nohz full because this could result
>   in long periods of no update followed by a massive halted time update
>   which doesn't play well with the existing PELT integration. It is
>   possible to address this limitation with generalized, more complex
>   accounting.

next prev parent reply	other threads:[~2025-02-26  2:17 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-18 20:26 [RFC PATCH 0/3] kvm,sched: Add gtime halted Fernand Sieber
2025-02-18 20:26 ` [RFC PATCH 1/3] fs/proc: Add gtime halted to proc/<pid>/stat Fernand Sieber
2025-02-18 20:26 ` [RFC PATCH 2/3] kvm/x86: Add support for gtime halted Fernand Sieber
2025-02-18 20:26 ` [RFC PATCH 3/3] sched,x86: Make the scheduler guest unhalted aware Fernand Sieber
2025-02-27  7:34   ` Vincent Guittot
2025-02-27  8:27     ` [RFC PATCH 3/3] sched, x86: " Sieber, Fernand
2025-02-27  9:03       ` Vincent Guittot
2025-02-26  2:17 ` Sean Christopherson [this message]
2025-02-26 20:27   ` [RFC PATCH 0/3] kvm,sched: Add gtime halted Sieber, Fernand
2025-02-26 21:00     ` Sean Christopherson
2025-02-27  7:20       ` Sieber, Fernand
2025-02-27 14:39         ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z755r4S_7BLbHlWa@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nh-open-source@amazon.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sieberf@amazon.com \
    --cc=vincent.guittot@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox