From: Sean Christopherson <seanjc@google.com>
To: Fernand Sieber <sieberf@amazon.com>
Cc: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Paolo Bonzini <pbonzini@redhat.com>,
x86@kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, nh-open-source@amazon.com
Subject: Re: [RFC PATCH 0/3] kvm,sched: Add gtime halted
Date: Tue, 25 Feb 2025 18:17:19 -0800 [thread overview]
Message-ID: <Z755r4S_7BLbHlWa@google.com> (raw)
In-Reply-To: <20250218202618.567363-1-sieberf@amazon.com>
On Tue, Feb 18, 2025, Fernand Sieber wrote:
> With guest hlt, pause and mwait pass through, the hypervisor loses
> visibility on real guest cpu activity. From the point of view of the
> host, such vcpus are always 100% active even when the guest is
> completely halted.
>
> Typically hlt, pause and mwait pass through is only implemented on
> non-timeshared pcpus. However, there are cases where this assumption
> cannot be strictly met as some occasional housekeeping work needs to be
What housekeeping work?
> scheduled on such cpus while we generally want to preserve the pass
> through performance gains. This applies for system which don't have
> dedicated cpus for housekeeping purposes.
>
> In such cases, the lack of visibility of the hypervisor is problematic
> from a load balancing point of view. In the absence of a better signal,
> it will preemt vcpus at random. For example it could decide to interrupt
> a vcpu doing critical idle poll work while another vcpu sits idle.
>
> Another motivation for gaining visibility into real guest cpu activity
> is to enable the hypervisor to vend metrics about it for external
> consumption.
Such as?
> In this RFC we introduce the concept of guest halted time to address
> these concerns. Guest halted time (gtime_halted) accounts for cycles
> spent in guest mode while the cpu is halted. gtime_halted relies on
> measuring the mperf msr register (x86) around VM enter/exits to compute
> the number of unhalted cycles; halted cycles are then derived from the
> tsc difference minus the mperf difference.
IMO, there are better ways to solve this than having KVM sample MPERF on every
entry and exit.
The kernel already samples APERF/MPREF on every tick and provides that information
via /proc/cpuinfo, just use that. If your userspace is unable to use /proc/cpuinfo
or similar, that needs to be explained.
And if you're running vCPUs on tickless CPUs, and you're doing HLT/MWAIT passthrough,
*and* you want to schedule other tasks on those CPUs, then IMO you're abusing all
of those things and it's not KVM's problem to solve, especially now that sched_ext
is a thing.
> gtime_halted is exposed to proc/<pid>/stat as a new entry, which enables
> users to monitor real guest activity.
>
> gtime_halted is also plumbed to the scheduler infrastructure to discount
> halted cycles from fair load accounting. This enlightens the load
> balancer to real guest activity for better task placement.
>
> This initial RFC has a few limitations and open questions:
> * only the x86 infrastructure is supported as it relies on architecture
> dependent registers. Future development will extend this to ARM.
> * we assume that mperf accumulates as the same rate as tsc. While I am
> not certain whether this assumption is ever violated, the spec doesn't
> seem to offer this guarantee [1] so we may want to calibrate mperf.
> * the sched enlightenment logic relies on periodic gtime_halted updates.
> As such, it is incompatible with nohz full because this could result
> in long periods of no update followed by a massive halted time update
> which doesn't play well with the existing PELT integration. It is
> possible to address this limitation with generalized, more complex
> accounting.
next prev parent reply other threads:[~2025-02-26 2:17 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-18 20:26 [RFC PATCH 0/3] kvm,sched: Add gtime halted Fernand Sieber
2025-02-18 20:26 ` [RFC PATCH 1/3] fs/proc: Add gtime halted to proc/<pid>/stat Fernand Sieber
2025-02-18 20:26 ` [RFC PATCH 2/3] kvm/x86: Add support for gtime halted Fernand Sieber
2025-02-18 20:26 ` [RFC PATCH 3/3] sched,x86: Make the scheduler guest unhalted aware Fernand Sieber
2025-02-27 7:34 ` Vincent Guittot
2025-02-27 8:27 ` [RFC PATCH 3/3] sched, x86: " Sieber, Fernand
2025-02-27 9:03 ` Vincent Guittot
2025-02-26 2:17 ` Sean Christopherson [this message]
2025-02-26 20:27 ` [RFC PATCH 0/3] kvm,sched: Add gtime halted Sieber, Fernand
2025-02-26 21:00 ` Sean Christopherson
2025-02-27 7:20 ` Sieber, Fernand
2025-02-27 14:39 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z755r4S_7BLbHlWa@google.com \
--to=seanjc@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nh-open-source@amazon.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=sieberf@amazon.com \
--cc=vincent.guittot@linaro.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox