public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Cc: Oliver Upton <oupton@kernel.org>, Joey Gouly <joey.gouly@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Kees Cook <kees@kernel.org>,
	"Gustavo A. R. Silva" <gustavoars@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>, Shuah Khan <shuah@kernel.org>,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	devel@daynix.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v7 3/4] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY
Date: Mon, 20 Apr 2026 10:51:57 +0100	[thread overview]
Message-ID: <86qzoa0xj6.wl-maz@kernel.org> (raw)
In-Reply-To: <06c6664c-7f0c-47b2-babf-ba2a541fd9f2@rsg.ci.i.u-tokyo.ac.jp>

On Mon, 20 Apr 2026 09:36:16 +0100,
Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> wrote:
> 
> On 2026/04/20 2:19, Marc Zyngier wrote:
> > On Sat, 18 Apr 2026 09:14:25 +0100,
> > Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> wrote:
> >> 
> >> On a heterogeneous arm64 system, KVM's PMU emulation is based on the
> >> features of a single host PMU instance. When a vCPU is migrated to a
> >> pCPU with an incompatible PMU, counters such as PMCCNTR_EL0 stop
> >> incrementing.
> >> 
> >> Although this behavior is permitted by the architecture, Windows does
> >> not handle it gracefully and may crash with a division-by-zero error.
> >> 
> >> The current workaround requires VMMs to pin vCPUs to a set of pCPUs
> >> that share a compatible PMU. This is difficult to implement correctly in
> >> QEMU/libvirt, where pinning occurs after vCPU initialization, and it
> >> also restricts the guest to a subset of available pCPUs.
> >> 
> >> Introduce the KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY attribute to
> >> create a "fixed-counters-only" PMU. When set, KVM exposes a PMU that is
> >> compatible with all pCPUs but that does not support programmable
> >> event counters which may have different feature sets on different PMUs.
> >> 
> >> This allows Windows guests to run reliably on heterogeneous systems
> >> without crashing, even without vCPU pinning, and enables VMMs to
> >> schedule vCPUs across all available pCPUs, making full use of the host
> >> hardware.
> >> 
> >> Much like KVM_ARM_VCPU_PMU_V3_IRQ and other read-write attributes, this
> >> attribute provides a getter that facilitates kernel and userspace
> >> debugging/testing.
> > 
> > OK, so that's the sales pitch. But how is it implemented? I would like
> > to be able to read a high-level description of the implementation
> > trade-offs.
> 
> Implementation-wise it is very trivial. Essentially the following
> addition in kvm_arm_pmu_v3_get_attr() is the entire implementation:
> +	case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
> +		if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
> &vcpu->kvm->arch.flags))
> +			return 0;
> 
> Both its functionality and code complexity is trivial. So we can argue that:
> - the functionality is too trivial to be useful or
> - the interface/implementation complexity is so trivial that it does not
>   incur maintenance burden
> 
> In this case the selftest uses the getter so I was more inclined to
> have it, but adding one just for the selftest sounds too ad-hoc, so
> here I looked into other attributes to ensure that it was not
> introducing inconsistency with existing interfaces.
> 
> As the result, I found there are other read-write attributes; in fact
> there are more read-write attributes than write-only ones.

You're completely missing the point. I'm referring to the whole of the
commit message, which is more of a marketing slide than a technical
description.

I really don't care about the getter at this stage, which while
pointless, does not make things more awful than they already are.

> 
> > 
> >> 
> >> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> >> ---
> >>   Documentation/virt/kvm/devices/vcpu.rst |  29 ++++++
> >>   arch/arm64/include/asm/kvm_host.h       |   2 +
> >>   arch/arm64/include/uapi/asm/kvm.h       |   1 +
> >>   arch/arm64/kvm/arm.c                    |   1 +
> >>   arch/arm64/kvm/pmu-emul.c               | 155 +++++++++++++++++++++++---------
> >>   include/kvm/arm_pmu.h                   |   2 +
> >>   6 files changed, 147 insertions(+), 43 deletions(-)
> >> 
> >> diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
> >> index 60bf205cb373..e0aeb1897d77 100644
> >> --- a/Documentation/virt/kvm/devices/vcpu.rst
> >> +++ b/Documentation/virt/kvm/devices/vcpu.rst
> >> @@ -161,6 +161,35 @@ explicitly selected, or the number of counters is out of range for the
> >>   selected PMU. Selecting a new PMU cancels the effect of setting this
> >>   attribute.
> >>   +1.6 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
> >> +------------------------------------------------------
> >> +
> >> +:Parameters: no additional parameter in kvm_device_attr.addr
> >> +
> >> +:Returns:
> >> +
> >> +	 =======  =====================================================
> >> +	 -EBUSY   Attempted to set after initializing PMUv3 or running
> >> +		  VCPU, or attempted to set for the first time after
> >> +		  setting an event filter
> >> +	 -ENXIO   Attempted to get before setting
> >> +	 -ENODEV  Attempted to set while PMUv3 not supported
> >> +	 =======  =====================================================
> >> +
> >> +If set, PMUv3 will be emulated without programmable event counters. The VCPU
> >> +will use any compatible hardware PMU. This attribute is particularly useful on
> > 
> > Not quite "any PMU". It will use *the* PMU of the physical CPU,
> > irrespective of the implementation.
> 
> I think:
> 
> - this comment
> - one on the KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED note
> - one on kvm_pmu_create_perf_event()
> - and one on kvm_arm_pmu_v3_set_pmu_fixed_counters_only()
> 
> All boil down into one question: will it support all possible CPUs, or
> will it support a subset? Let me answer here:
> 
> This patch is written to support a subset instead of all possible
> CPUs. If a pCPU does not have a compatible PMU, the pCPU will not be
> supported and cause KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED.

This is not a thing. Either *all* the CPUs have a PMU that can be used
for KVM, or PMU support is not offered to guests. That's a hard line
in the sand. And the code already upholds this by checking the
sanitised PMUVer field.

>
> This patch does not enforce all possible CPUs are covered by the
> compatible PMUs. Theoretically speaking,
> kvm_arm_pmu_get_pmuver_limit() enables the PMU emulation when real
> PMUv3 hardware covers all possible CPUs *or* the relevant registers
> can be trapped with IMPDEF, so some pCPU may not have a compatible PMU
> and only provide the IMPDEF trapping.

How is that possible? Please describe the case where that can happen,
and I will make sure that such a system stops booting. The intent is
definitely that that:

- for early CPUs, we take the minimal capability of all CPUs

- for late CPUs, either they match at least the capability recorded by
  early CPUs, or they don't boot.

> Practically, I don't think any sane configuration will ever have such
> a subset support, so we can explicitly enforce all possible CPUs are
> covered by the compatible PMUs if desired.

That's not just desired. This is a requirement. And it is already
enforced AFAICS.

> 
> > 
> >> +heterogeneous systems where different hardware PMUs cover different physical
> >> +CPUs. The compatibility of hardware PMUs can be checked with
> >> +KVM_ARM_VCPU_PMU_V3_SET_PMU. All VCPUs in a VM share this attribute. It isn't
> >> +possible to set it for the first time if a PMU event filter is already present.
> > 
> > "for the first time" gives the impression that it will work if you try
> > again. I'd rather we say that "This feature is incompatible with the
> > existence of a PMU event filter".
> 
> The following sequence will work:
> 1. Set KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
> 2. Set KVM_ARM_VCPU_PMU_V3_FILTER
> 3. Set KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
> 
> This is to make the behavior conistent with KVM_ARM_VCPU_PMU_V3_SET_PMU.

I don't think this is correct. Filtering is completely at odds with
this patch, and I don't want to have to reason about the combination.

[...]

> >> +	int i;
> >> +
> >> +	for_each_set_bit(i, &mask, 32) {
> >> +		pmc = kvm_vcpu_idx_to_pmc(vcpu, i);
> >> +		if (!pmc->perf_event)
> >> +			continue;
> >> +
> >> +		cpu_pmu = to_arm_pmu(pmc->perf_event->pmu);
> >> +		if (!cpumask_test_cpu(vcpu->cpu, &cpu_pmu->supported_cpus)) {
> >> +			kvm_make_request(KVM_REQ_RELOAD_PMU, vcpu);
> >> +			break;
> >> +		}
> >> +	}
> >> +}
> >> +
> > 
> > Why do we need to inflict this on VMs that do not have the fixed
> > counter restriction?
> 
> This function is to re-create the perf_event in case the current
> perf_event does not support the pCPU because e.g., the pCPU is a
> E-core while the perf_event only covers the P-cores.

That's not what I meant. This code is only here to support the
fixed-function feature. It makes no sense outside of it, because *we
don't support counter migration across implementations*.

So what's the purpose of this stuff for the normal KVM setup?

> 
> > 
> > And even then, all you have to reconfigure is the cycle counter. So
> > why the loop? All we want to find out is whether the cycle counter is
> > instantiated on the PMU that matches the current CPU.
> 
> I just wanted to avoid hardcoding assumptions on the fixed
> counter(s). FEAT_PMUv3_ICNTR will be naturaly handled with a loop, for
> example.

Well, not that loop, since ICNTR is counter 32. So please let's stop
the nonsense and only add what is required?

[...]

> >>   +
> >> clear_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
> >> &kvm->arch.flags);
> > 
> > Why does this need to be cleared? I'd rather we make sure it is never
> > set the first place.
> 
> KVM_ARM_VCPU_PMU_V3_SET_PMU and
> KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY can be set on the same
> VCPU. The last KVM_ARM_VCPU_PMU_V3_SET_PMU or
> KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY setting will be effective.
> 
> A VMM may try set these attributes to check if the setting is
> supported. For example, the RFC QEMU patch first uses
> KVM_ARM_VCPU_PMU_V3_SET_PMU to find a compatible PMU that covers all
> pCPUs, and then falls back to
> KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY. The order of such probing is
> up to the VMM.

KVM_ARM_VCPU_PMU_V3_SET_PMU is not a probing mechanism. You must probe
the PMUs by looking in /sys/bus/event_source/devices/, like kvmtool
does.

So there is no reason to support this stuff, and the two flags should
be made mutually exclusive.

[...]

> >>
> > 
> > In conclusion, I find this patch to be rather messy. For a start, it
> > needs to be split in at least 5 patches:
> > 
> > - at least two for the refactoring
> > - one for the PMU core changes
> > - one for the UAPI
> > - one for documentation
> 
> That clarifies the expected granurarity of patches. The next version
> will be in that layout, perhaps with more patches if an additional
> change. Thanks for the guidance.
> 
> > 
> > I'd also like some clarification on how this is intended to work if we
> > enable FEAT_PMUv3_ICNTR, because the definition seems to be designed
> > to encompass all fixed-function counters, and I expect this to grow
> > over time.
> 
> Indeed the UAPI was designed to encompass all fixed-function counters
> as suggested by Oliver.
> 
> To support the UAPI, the implementation avoids hardcoding the
> assumption on the fixed counter(s). FEAT_PMUv3_INCTR will be naturaly
> supported once the common code is properly updated (i.e., the size of
> the event counter bitmask is grown the corresponding registers are
> wired up with a proper check of the feature.)
> 
> I expect migration will be handled with the conventional register
> getters and setters, but please share if you have a concern.

At the very least I want to see some documentation explaining that.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

  reply	other threads:[~2026-04-20  9:52 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-18  8:14 [PATCH v7 0/4] KVM: arm64: PMU: Use multiple host PMUs Akihiko Odaki
2026-04-18  8:14 ` [PATCH v7 1/4] KVM: arm64: PMU: Add kvm_pmu_enabled_counter_mask() Akihiko Odaki
2026-04-19 14:13   ` Marc Zyngier
2026-04-20  5:27     ` Akihiko Odaki
2026-04-18  8:14 ` [PATCH v7 2/4] KVM: arm64: PMU: Protect the list of PMUs with RCU Akihiko Odaki
2026-04-19 14:34   ` Marc Zyngier
2026-04-20  6:21     ` Akihiko Odaki
2026-04-20  7:01       ` Marc Zyngier
2026-04-20  7:17         ` Akihiko Odaki
2026-04-18  8:14 ` [PATCH v7 3/4] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY Akihiko Odaki
2026-04-19 17:19   ` Marc Zyngier
2026-04-20  8:36     ` Akihiko Odaki
2026-04-20  9:51       ` Marc Zyngier [this message]
2026-04-20 12:07         ` Akihiko Odaki
2026-04-20 13:53           ` Marc Zyngier
2026-04-18  8:14 ` [PATCH v7 4/4] KVM: arm64: selftests: Test PMU_V3_FIXED_COUNTERS_ONLY Akihiko Odaki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86qzoa0xj6.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=devel@daynix.com \
    --cc=gustavoars@kernel.org \
    --cc=joey.gouly@arm.com \
    --cc=kees@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=odaki@rsg.ci.i.u-tokyo.ac.jp \
    --cc=oupton@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=shuah@kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox