From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6862C2EA75E; Mon, 20 Apr 2026 13:53:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776693199; cv=none; b=OEATqB0g52NmSdCCIWzCDLjLP2q4swubpW9w55rdNES6DK77KXzMkXYlwUOjucg/2rSlVjUzCnOyM1HU+SafC/OvmGsn7Ld2Mxk+XxcTJw4bGFSZv+jStTYVStpby6meW2zetauI2rKozfbgHpLJVeIiUit+uTVJUduucCckmVE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776693199; c=relaxed/simple; bh=/2WX7QRrreqNa1lPcnWhMEbZ0TkX/QFlLwzbB3DhuUg=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=HUfLQbJHOmdiU1r7Gc4u/2b3r60RkIuTkMNoJ7lLchsSwAoETvEeBGuqrMHM8Zp9vZ7q8LLhJFH3uZG8mHcz0HrAuQLM1eFR8jvk8px/HGfSM2KZ8UM/f5V79e6CTrrYwyh+XqDLGOqZlo2Wy1CEv5IxncF5OSSHD/tlJtf8Lm4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CLHDrjXd; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CLHDrjXd" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CEB41C19425; Mon, 20 Apr 2026 13:53:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776693199; bh=/2WX7QRrreqNa1lPcnWhMEbZ0TkX/QFlLwzbB3DhuUg=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=CLHDrjXdglACtNImEqi28IDMMq7a5VhtuFLdyxVmCXCyceC4zd94riRPfAVgYMPA0 EvhnwhPlhqZSYz1oW/9EpYgcRqCh0KlVGeYyKv0OaJ+I3bcrttEzeUId2W6blRETn6 m47ogPkO9Fvt5G3+IJTKRSl230r0zn/MAa3H5V2XJ7nKn4yOY0wRf2BX6yvvqTy3QP MzM+0VSDOTi2+sUr92tv5K+/QzK2ZzxE0ur6/57SeB/VszFCS0E3IiRytMNb1KD7pn iveHZNnTVhLopLxmDmSXERfBw8mGUtEfPMeRPcdm2RvxvO53fB17/KW31sQf6hOor2 FJmdjbmlZxbHw== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wEp3s-0000000D5cc-1J1N; Mon, 20 Apr 2026 13:53:16 +0000 Date: Mon, 20 Apr 2026 14:53:15 +0100 Message-ID: <86ldeh20xg.wl-maz@kernel.org> From: Marc Zyngier To: Akihiko Odaki Cc: Oliver Upton , Joey Gouly , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Kees Cook , "Gustavo A. R. Silva" , Paolo Bonzini , Jonathan Corbet , Shuah Khan , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org, devel@daynix.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: Re: [PATCH v7 3/4] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY In-Reply-To: References: <20260418-hybrid-v7-0-2bf39ad009bf@rsg.ci.i.u-tokyo.ac.jp> <20260418-hybrid-v7-3-2bf39ad009bf@rsg.ci.i.u-tokyo.ac.jp> <87ldeic1gk.wl-maz@kernel.org> <06c6664c-7f0c-47b2-babf-ba2a541fd9f2@rsg.ci.i.u-tokyo.ac.jp> <86qzoa0xj6.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: odaki@rsg.ci.i.u-tokyo.ac.jp, oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, kees@kernel.org, gustavoars@kernel.org, pbonzini@redhat.com, corbet@lwn.net, shuah@kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org, devel@daynix.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Mon, 20 Apr 2026 13:07:15 +0100, Akihiko Odaki wrote: > > On 2026/04/20 18:51, Marc Zyngier wrote: > > On Mon, 20 Apr 2026 09:36:16 +0100, > > Akihiko Odaki wrote: > >> > >> On 2026/04/20 2:19, Marc Zyngier wrote: > >>> On Sat, 18 Apr 2026 09:14:25 +0100, > >>> Akihiko Odaki wrote: > >>>> > >>>> On a heterogeneous arm64 system, KVM's PMU emulation is based on the > >>>> features of a single host PMU instance. When a vCPU is migrated to a > >>>> pCPU with an incompatible PMU, counters such as PMCCNTR_EL0 stop > >>>> incrementing. > >>>> > >>>> Although this behavior is permitted by the architecture, Windows does > >>>> not handle it gracefully and may crash with a division-by-zero error. > >>>> > >>>> The current workaround requires VMMs to pin vCPUs to a set of pCPUs > >>>> that share a compatible PMU. This is difficult to implement correctly in > >>>> QEMU/libvirt, where pinning occurs after vCPU initialization, and it > >>>> also restricts the guest to a subset of available pCPUs. > >>>> > >>>> Introduce the KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY attribute to > >>>> create a "fixed-counters-only" PMU. When set, KVM exposes a PMU that is > >>>> compatible with all pCPUs but that does not support programmable > >>>> event counters which may have different feature sets on different PMUs. > >>>> > >>>> This allows Windows guests to run reliably on heterogeneous systems > >>>> without crashing, even without vCPU pinning, and enables VMMs to > >>>> schedule vCPUs across all available pCPUs, making full use of the host > >>>> hardware. > >>>> > >>>> Much like KVM_ARM_VCPU_PMU_V3_IRQ and other read-write attributes, this > >>>> attribute provides a getter that facilitates kernel and userspace > >>>> debugging/testing. > >>> > >>> OK, so that's the sales pitch. But how is it implemented? I would like > >>> to be able to read a high-level description of the implementation > >>> trade-offs. > >> > >> Implementation-wise it is very trivial. Essentially the following > >> addition in kvm_arm_pmu_v3_get_attr() is the entire implementation: > >> + case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY: > >> + if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, > >> &vcpu->kvm->arch.flags)) > >> + return 0; > >> > >> Both its functionality and code complexity is trivial. So we can argue that: > >> - the functionality is too trivial to be useful or > >> - the interface/implementation complexity is so trivial that it does not > >> incur maintenance burden > >> > >> In this case the selftest uses the getter so I was more inclined to > >> have it, but adding one just for the selftest sounds too ad-hoc, so > >> here I looked into other attributes to ensure that it was not > >> introducing inconsistency with existing interfaces. > >> > >> As the result, I found there are other read-write attributes; in fact > >> there are more read-write attributes than write-only ones. > > > > You're completely missing the point. I'm referring to the whole of the > > commit message, which is more of a marketing slide than a technical > > description. > > In terms of implementation, the obvious tradeoff is that it adds more > code to implement the feature. One thing to note is that > kvm_vcpu_load_pmu() is added and is called each time a vCPU migrates > across pCPUs. The heavy part, making the KVM_REQ_RELOAD_PMU request, > only happens when the feature is enabled. Well, that's what I want to see. The repeated blurb about Windows being broken is cover letter material, but not fir for a commit message. [...] > >>> "for the first time" gives the impression that it will work if you try > >>> again. I'd rather we say that "This feature is incompatible with the > >>> existence of a PMU event filter". > >> > >> The following sequence will work: > >> 1. Set KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY > >> 2. Set KVM_ARM_VCPU_PMU_V3_FILTER > >> 3. Set KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY > >> > >> This is to make the behavior conistent with KVM_ARM_VCPU_PMU_V3_SET_PMU. > > > > I don't think this is correct. Filtering is completely at odds with > > this patch, and I don't want to have to reason about the combination. > > kvm_arm_pmu_v3_set_pmu() has the following condition: > > if (kvm_vm_has_ran_once(kvm) || > (kvm->arch.pmu_filter && kvm->arch.arm_pmu != arm_pmu)) { > ret = -EBUSY; > break; > } > > kvm_arm_pmu_v3_set_pmu_fixed_counters_only() has the corresponding > condition for consistency: > > if (kvm_vm_has_ran_once(kvm) || > (kvm->arch.pmu_filter && > !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, > &kvm->arch.flags))) > return -EBUSY; > > We can of course kill the PMU event filter for > FIXED_COUNTERS_ONLY. The filter is effectively no-op with > FIXED_COUNTERS_ONLY and I don't think that consistency matters much. We shouldn't allow weird combinations in the UAPI. Since it makes no sense to have both fixed-function *and* filters, we should make them mutually exclusive. [...] > >> I expect migration will be handled with the conventional register > >> getters and setters, but please share if you have a concern. > > > > At the very least I want to see some documentation explaining that. > > What kind of documentation do you expect? A description of what counters are exposed by this feature, and what architectural features they are dependent on. > If we change kvm_vcpu_load_pmu() to avoid for_each_set_bit(), there > would be a good chance to forget updating it when mechanically > updating existing for_each_set_bit() instances, so it is a candidate > for documentation. But I don't have a good idea where to place it > either. The moment we introduce FEAT_PMUv3_ICNTR, the whole PMUv3 emulation will catch fire anyway, as we already limit ourselves to 32 bits for reset and nesting switch. So this will be a major redesign anyway. If you are really worried, leave a comment in there. M. -- Without deviation from the norm, progress is not possible.