All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Fernand Sieber <sieberf@amazon.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	 Vincent Guittot <vincent.guittot@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	x86@kernel.org,  kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, nh-open-source@amazon.com
Subject: Re: [RFC PATCH 0/3] kvm,sched: Add gtime halted
Date: Tue, 25 Feb 2025 18:17:19 -0800	[thread overview]
Message-ID: <Z755r4S_7BLbHlWa@google.com> (raw)
In-Reply-To: <20250218202618.567363-1-sieberf@amazon.com>

On Tue, Feb 18, 2025, Fernand Sieber wrote:
> With guest hlt, pause and mwait pass through, the hypervisor loses
> visibility on real guest cpu activity. From the point of view of the
> host, such vcpus are always 100% active even when the guest is
> completely halted.
> 
> Typically hlt, pause and mwait pass through is only implemented on
> non-timeshared pcpus. However, there are cases where this assumption
> cannot be strictly met as some occasional housekeeping work needs to be

What housekeeping work?

> scheduled on such cpus while we generally want to preserve the pass
> through performance gains. This applies for system which don't have
> dedicated cpus for housekeeping purposes.
> 
> In such cases, the lack of visibility of the hypervisor is problematic
> from a load balancing point of view. In the absence of a better signal,
> it will preemt vcpus at random. For example it could decide to interrupt
> a vcpu doing critical idle poll work while another vcpu sits idle.
> 
> Another motivation for gaining visibility into real guest cpu activity
> is to enable the hypervisor to vend metrics about it for external
> consumption.

Such as?

> In this RFC we introduce the concept of guest halted time to address
> these concerns. Guest halted time (gtime_halted) accounts for cycles
> spent in guest mode while the cpu is halted. gtime_halted relies on
> measuring the mperf msr register (x86) around VM enter/exits to compute
> the number of unhalted cycles; halted cycles are then derived from the
> tsc difference minus the mperf difference.

IMO, there are better ways to solve this than having KVM sample MPERF on every
entry and exit.

The kernel already samples APERF/MPREF on every tick and provides that information
via /proc/cpuinfo, just use that.  If your userspace is unable to use /proc/cpuinfo
or similar, that needs to be explained.

And if you're running vCPUs on tickless CPUs, and you're doing HLT/MWAIT passthrough,
*and* you want to schedule other tasks on those CPUs, then IMO you're abusing all
of those things and it's not KVM's problem to solve, especially now that sched_ext
is a thing.

> gtime_halted is exposed to proc/<pid>/stat as a new entry, which enables
> users to monitor real guest activity.
> 
> gtime_halted is also plumbed to the scheduler infrastructure to discount
> halted cycles from fair load accounting. This enlightens the load
> balancer to real guest activity for better task placement.
> 
> This initial RFC has a few limitations and open questions:
> * only the x86 infrastructure is supported as it relies on architecture
>   dependent registers. Future development will extend this to ARM.
> * we assume that mperf accumulates as the same rate as tsc. While I am
>   not certain whether this assumption is ever violated, the spec doesn't
>   seem to offer this guarantee [1] so we may want to calibrate mperf.
> * the sched enlightenment logic relies on periodic gtime_halted updates.
>   As such, it is incompatible with nohz full because this could result
>   in long periods of no update followed by a massive halted time update
>   which doesn't play well with the existing PELT integration. It is
>   possible to address this limitation with generalized, more complex
>   accounting.

  parent reply	other threads:[~2025-02-26  2:17 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-18 20:26 [RFC PATCH 0/3] kvm,sched: Add gtime halted Fernand Sieber
2025-02-18 20:26 ` [RFC PATCH 1/3] fs/proc: Add gtime halted to proc/<pid>/stat Fernand Sieber
2025-02-18 20:26 ` [RFC PATCH 2/3] kvm/x86: Add support for gtime halted Fernand Sieber
2025-02-18 20:26 ` [RFC PATCH 3/3] sched,x86: Make the scheduler guest unhalted aware Fernand Sieber
2025-02-27  7:34   ` Vincent Guittot
2025-02-27  8:27     ` [RFC PATCH 3/3] sched, x86: " Sieber, Fernand
2025-02-27  9:03       ` Vincent Guittot
2025-02-26  2:17 ` Sean Christopherson [this message]
2025-02-26 20:27   ` [RFC PATCH 0/3] kvm,sched: Add gtime halted Sieber, Fernand
2025-02-26 21:00     ` Sean Christopherson
2025-02-27  7:20       ` Sieber, Fernand
2025-02-27 14:39         ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z755r4S_7BLbHlWa@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nh-open-source@amazon.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sieberf@amazon.com \
    --cc=vincent.guittot@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.