From: Oliver Upton <oliver.upton@linux.dev>
To: Colton Lewis <coltonlewis@google.com>
Cc: kvm@vger.kernel.org, pbonzini@redhat.com, corbet@lwn.net,
linux@armlinux.org.uk, catalin.marinas@arm.com, will@kernel.org,
maz@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, mark.rutland@arm.com, shuah@kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-perf-users@vger.kernel.org,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH 06/17] KVM: arm64: Introduce method to partition the PMU
Date: Tue, 3 Jun 2025 15:02:04 -0700 [thread overview]
Message-ID: <aD9w3Kj4-YoizKv5@linux.dev> (raw)
In-Reply-To: <gsnt4iww3406.fsf@coltonlewis-kvm.c.googlers.com>
On Tue, Jun 03, 2025 at 09:32:41PM +0000, Colton Lewis wrote:
> Oliver Upton <oliver.upton@linux.dev> writes:
>
> > On Mon, Jun 02, 2025 at 07:26:51PM +0000, Colton Lewis wrote:
> > > static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
> > > {
> > > + u8 hpmn = vcpu->kvm->arch.arm_pmu->hpmn;
> > > +
> > > preempt_disable();
>
> > > /*
> > > * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK
> > > * to disable guest access to the profiling and trace buffers
> > > */
> > > - vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN,
> > > - *host_data_ptr(nr_event_counters));
> > > - vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> > > + vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, hpmn);
> > > + vcpu->arch.mdcr_el2 |= (MDCR_EL2_HPMD |
> > > + MDCR_EL2_TPM |
>
> > This isn't safe, as there's no guarantee that kvm_arch::arm_pmu is
> > pointing that the PMU for this CPU. KVM needs to derive HPMN from some
> > per-CPU state, not anything tied to the VM/vCPU.
>
> I'm confused. Isn't this function preparing to run the vCPU on this
> CPU? Why would it be pointing at a different PMU?
Because arm64 is a silly ecosystem and system designers can glue
together heterogenous CPU implementations. The arm_pmu that KVM is
pointing at might only match a subset of CPUs, but vCPUs migrate at the
whim of the scheduler (and userspace).
> And HPMN is something that we only want set when running a vCPU, so
> there isn't any per-CPU state saying it should be anything but the
> default value (number of counters) outside that context.
>
> Unless you just mean I should check the number of counters again and
> make sure HPMN is not an invalid value.
As you've implemented it the host cannot schedule events in the guest
range of counters regardless of context. You need to reconcile that
global limit with the desires of the VMM on how many counters it wants
presented to this particular guest.
> > > +/**
> > > + * kvm_pmu_partition() - Partition the PMU
> > > + * @pmu: Pointer to pmu being partitioned
> > > + * @host_counters: Number of host counters to reserve
> > > + *
> > > + * Partition the given PMU by taking a number of host counters to
> > > + * reserve and, if it is a valid reservation, recording the
> > > + * corresponding HPMN value in the hpmn field of the PMU and clearing
> > > + * the guest-reserved counters from the counter mask.
> > > + *
> > > + * Passing 0 for @host_counters has the effect of disabling
> > > partitioning.
> > > + *
> > > + * Return: 0 on success, -ERROR otherwise
> > > + */
> > > +int kvm_pmu_partition(struct arm_pmu *pmu, u8 host_counters)
> > > +{
> > > + u8 nr_counters;
> > > + u8 hpmn;
> > > +
> > > + if (!kvm_pmu_reservation_is_valid(host_counters))
> > > + return -EINVAL;
> > > +
> > > + nr_counters = *host_data_ptr(nr_event_counters);
> > > + hpmn = kvm_pmu_hpmn(host_counters);
> > > +
> > > + if (hpmn < nr_counters) {
> > > + pmu->hpmn = hpmn;
> > > + /* Inform host driver of available counters */
> > > + bitmap_clear(pmu->cntr_mask, 0, hpmn);
> > > + bitmap_set(pmu->cntr_mask, hpmn, nr_counters);
> > > + clear_bit(ARMV8_PMU_CYCLE_IDX, pmu->cntr_mask);
> > > + if (pmuv3_has_icntr())
> > > + clear_bit(ARMV8_PMU_INSTR_IDX, pmu->cntr_mask);
> > > +
> > > + kvm_debug("Partitioned PMU with HPMN %u", hpmn);
> > > + } else {
> > > + pmu->hpmn = nr_counters;
> > > + bitmap_set(pmu->cntr_mask, 0, nr_counters);
> > > + set_bit(ARMV8_PMU_CYCLE_IDX, pmu->cntr_mask);
> > > + if (pmuv3_has_icntr())
> > > + set_bit(ARMV8_PMU_INSTR_IDX, pmu->cntr_mask);
> > > +
> > > + kvm_debug("Unpartitioned PMU");
> > > + }
> > > +
> > > + return 0;
> > > +}
>
> > Hmm... Just in terms of code organization I'm not sure I like having KVM
> > twiddling with *host* support for PMUv3. Feels like the ARM PMU driver
> > should own partitioning and KVM just takes what it can get.
>
> Okay. I can move the code.
>
> > > @@ -239,6 +245,13 @@ void kvm_host_pmu_init(struct arm_pmu *pmu)
> > > if (!pmuv3_implemented(kvm_arm_pmu_get_pmuver_limit()))
> > > return;
>
> > > + if (reserved_host_counters) {
> > > + if (kvm_pmu_partition_supported())
> > > + WARN_ON(kvm_pmu_partition(pmu, reserved_host_counters));
> > > + else
> > > + kvm_err("PMU Partition is not supported");
> > > + }
> > > +
>
> > Hasn't the ARM PMU been registered with perf at this point? Surely the
> > driver wouldn't be very pleased with us ripping counters out from under
> > its feet.
>
> AFAICT nothing in perf registration cares about the number of counters
> the PMU has. The PMUv3 driver tracks its own available counters through
> cntr_mask and I modify that during partition.
>
> Since this is still initialization of the PMU, I don't believe anything
> has had a chance to use a counter yet that will be ripped away.
Given that kvm_pmu_partition() is called from an ioctl, it is entirely
possible that events have been scheduled prior to applying the
partition.
> Aesthetically It makes since to change this if I move the partitioning
> code to the PMUv3 driver, but I think it's inconsequential to the
> function.
There are two *very* distinct functions w.r.t. partitioning:
1) Partitioning of a particular arm_pmu that says how many counters the
host can use
2) VMM intentions to present a subset of the KVM-owned counter
partition to its guest
#1 is modifying *global* state, we really can't mess with that in the
context of a single VM...
Thanks,
Oliver
next prev parent reply other threads:[~2025-06-03 22:02 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-02 19:26 [PATCH 00/17] ARM64 PMU Partitioning Colton Lewis
2025-06-02 19:26 ` [PATCH 01/17] arm64: cpufeature: Add cpucap for HPMN0 Colton Lewis
2025-06-02 22:15 ` Oliver Upton
2025-06-03 20:50 ` Colton Lewis
2025-06-02 19:26 ` [PATCH 02/17] arm64: Generate sign macro for sysreg Enums Colton Lewis
2025-06-02 19:26 ` [PATCH 03/17] arm64: cpufeature: Add cpucap for PMICNTR Colton Lewis
2025-06-02 19:26 ` [PATCH 04/17] KVM: arm64: Cleanup PMU includes Colton Lewis
2025-06-02 21:42 ` Sean Christopherson
2025-06-03 20:48 ` Colton Lewis
2025-06-02 19:26 ` [PATCH 05/17] KVM: arm64: Reorganize PMU functions Colton Lewis
2025-06-02 19:26 ` [PATCH 06/17] KVM: arm64: Introduce method to partition the PMU Colton Lewis
2025-06-02 22:28 ` Oliver Upton
2025-06-03 21:32 ` Colton Lewis
2025-06-03 22:02 ` Oliver Upton [this message]
2025-06-04 20:10 ` Colton Lewis
2025-06-04 20:57 ` Oliver Upton
2025-06-02 19:26 ` [PATCH 07/17] perf: arm_pmuv3: Generalize counter bitmasks Colton Lewis
2025-06-02 19:26 ` [PATCH 08/17] perf: arm_pmuv3: Keep out of guest counter partition Colton Lewis
2025-06-02 19:26 ` [PATCH 09/17] KVM: arm64: Set up FGT for Partitioned PMU Colton Lewis
2025-06-02 19:26 ` [PATCH 10/17] KVM: arm64: Writethrough trapped PMEVTYPER register Colton Lewis
2025-06-03 22:22 ` Oliver Upton
2025-06-04 20:10 ` Colton Lewis
2025-06-02 19:26 ` [PATCH 11/17] KVM: arm64: Use physical PMSELR for PMXEVTYPER if partitioned Colton Lewis
2025-06-02 19:26 ` [PATCH 12/17] KVM: arm64: Writethrough trapped PMOVS register Colton Lewis
2025-06-02 19:26 ` [PATCH 13/17] KVM: arm64: Context switch Partitioned PMU guest registers Colton Lewis
2025-06-02 19:26 ` [PATCH 14/17] perf: pmuv3: Handle IRQs for Partitioned PMU guest counters Colton Lewis
2025-06-02 19:27 ` [PATCH 15/17] KVM: arm64: Inject recorded guest interrupts Colton Lewis
2025-06-02 19:27 ` [PATCH 16/17] KVM: arm64: Add ioctl to partition the PMU when supported Colton Lewis
2025-06-02 22:40 ` Oliver Upton
2025-06-03 21:46 ` Colton Lewis
2025-06-04 20:12 ` Colton Lewis
2025-06-02 19:27 ` [PATCH 17/17] KVM: arm64: selftests: Add test case for partitioned PMU Colton Lewis
2025-06-03 22:43 ` [PATCH 00/17] ARM64 PMU Partitioning Oliver Upton
2025-06-04 20:10 ` Colton Lewis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aD9w3Kj4-YoizKv5@linux.dev \
--to=oliver.upton@linux.dev \
--cc=catalin.marinas@arm.com \
--cc=coltonlewis@google.com \
--cc=corbet@lwn.net \
--cc=joey.gouly@arm.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=pbonzini@redhat.com \
--cc=shuah@kernel.org \
--cc=suzuki.poulose@arm.com \
--cc=will@kernel.org \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.