From: Yosry Ahmed <yosry@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Jim Mattson <jmattson@google.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM
Date: Fri, 1 May 2026 17:50:17 +0000 [thread overview]
Message-ID: <afTnlv_cAWFfouqk@google.com> (raw)
In-Reply-To: <CAO9r8zNM-ECiiSKGS4Z7eM2S_06_6+fDO4mEeKiWFM-NFgHOUg@mail.gmail.com>
On Thu, Apr 30, 2026 at 08:34:59PM -0700, Yosry Ahmed wrote:
> On Thu, Apr 30, 2026 at 4:24 PM Yosry Ahmed <yosry@kernel.org> wrote:
> >
> > > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> > > index 0e99022168a85..0c372b9f8ed34 100644
> > > --- a/arch/x86/kvm/pmu.h
> > > +++ b/arch/x86/kvm/pmu.h
> > > @@ -36,6 +36,7 @@ struct kvm_pmu_ops {
> > > void (*reset)(struct kvm_vcpu *vcpu);
> > > void (*deliver_pmi)(struct kvm_vcpu *vcpu);
> > > void (*cleanup)(struct kvm_vcpu *vcpu);
> > > + void (*reprogram_counters)(struct kvm_vcpu *vcpu, u64 counters);
> > >
> > > bool (*is_mediated_pmu_supported)(struct x86_pmu_capability *host_pmu);
> > > void (*mediated_load)(struct kvm_vcpu *vcpu);
> > > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> > > index 7aa298eeb0721..fe6f2bb79ab83 100644
> > > --- a/arch/x86/kvm/svm/pmu.c
> > > +++ b/arch/x86/kvm/svm/pmu.c
> > > @@ -260,6 +260,48 @@ static void amd_mediated_pmu_put(struct kvm_vcpu *vcpu)
> > > wrmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, pmu->global_status);
> > > }
> > >
> > > +static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
> > > + struct kvm_pmc *pmc)
> > > +{
> > > + u64 host_guest_bits;
> > > +
> > > + if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
> > > + return;
> > > +
> > > + /* Count all events if both bits are cleared */
> > > + host_guest_bits = pmc->eventsel & AMD64_EVENTSEL_HOST_GUEST_MASK;
> > > + if (!host_guest_bits)
> > > + return;
> > > +
> > > + /*
> > > + * If EFER.SVME is set, the counter is disabledd if only one of the bits
> > > + * is set and it doesn't match the vCPU context. If EFER.SVME is
> > > + * cleared, the counter is disable if any of the bits is set.
> > > + */
> > > + if (vcpu->arch.efer & EFER_SVME) {
> > > + if (host_guest_bits == AMD64_EVENTSEL_HOST_GUEST_MASK)
> > > + return;
> > > +
> > > + if (!!(host_guest_bits & AMD64_EVENTSEL_GUESTONLY) == is_guest_mode(vcpu))
> > > + return;
> > > + }
> > > +
> > > + pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
> >
> > Sashiko raised a good point here. In the following patch, we reprogram
> > the counters synchronously on nested transitions to update the
> > counters' enablement before counting VMRUN or WRMSR(EFER.SVME).
> > However, this updates pmc->eventsel_hw while
> > kvm_pmu_recalc_pmc_emulation() checks pmc->eventsel to check if the
> > counter is enabled.
> >
> > I think either pmc_is_locally_enabled() needs to check
> > pmc->eventsel_hw or we need to update pmc->eventsel here.
> >
> > AFAICT, pmc->eventself has the value written to the MSR, so I think we
> > want to keep that as-is.
> >
> > On the other hand, ARCH_PERFMON_EVENTSEL_ENABLE is cleared in
> > pmc->eventsel_hw in kvm_mediated_pmu_refresh_event_filter() if the
> > event is not allowed, and kvm_pmu_recalc_pmc_emulation() has a comment
> > about intentionally ignoring event filters.
> >
> > We can also separately track enablement for nested purposes, but that
> > would add one more thing we need to check aside from general counter
> > enablement and event filtering.
> >
> > None of these options are ideal, perhaps directly clearing the bit in
> > pmc->eventsel would do the least damage as (pmc->eventsel &
> > ARCH_PERFMON_EVENTSEL_ENABLE) is only checked by
> > pmc_is_locally_enabled().
>
> No, we can't really clear the bit in pmc->eventsel as that's what the
> guest reads with RDMSR.
>
> The more I think about this, the more I think we should drop
> pmc->eventsel_hw. It serves two purposes AFAICT:
> 1. On AMD, we use it to clear HOSTONLY and set GUESTONLY in actual HW.
> 2. For event filtering, we use it to clear EVENTSEL_ENABLE.
>
> But maybe it's easier to explicitly track the changes we need to apply
> to eventsel rather than a HW version?
>
> (1) is trivial, we can just clear HOSTONLY and set GUESTONLY before
> doing the actual write as that's constant.
>
> For (2), instead of eventsel_hw, maybe add a boolean called 'filtered'
> to track if that PMC should be filtered out or not. Then, we can add
> another boolean to track if the counter is disabled due to
> nested/HG-bits (e.g. 'nested_disabled').
>
> With that, pmc_is_locally_enabled() would check:
> (pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) && !pmc->nested_disabled
>
> , and then before writing to HW we check pmc->filtered,
> pmc->nested_disabled, as well as do the HOSTONLY/GUESTONLY changes for
> AMD.
>
> Actually, instead of a boolean, maybe add 'disabled_reasons' flags,
> with possible flags like PMC_DISABLED_FILTER and PMC_DISABLED_NESTED.
>
> This might all be unclear, I will draft some diff on top of the series
> tomorrow and send it in case it makes things clearer.
This is what I had in mind essentially:
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b191967c9c1e4..9c38c47581e75 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -515,11 +515,21 @@ enum pmc_type {
KVM_PMC_FIXED,
};
+#define KVM_PMC_DISABLE_FILTERED BIT(0)
+#define KVM_PMC_DISABLE_NESTED BIT(1)
+
struct kvm_pmc {
enum pmc_type type;
u8 idx;
bool is_paused;
bool intr;
+
+ /*
+ * Reasons why the PMC should be disabled in eventsel when written to HW
+ * with the mediated vPMU.
+ */
+ u8 eventsel_hw_disable_reasons;
+
/*
* Base value of the PMC counter, relative to the *consumed* count in
* the associated perf_event. This value includes counter updates from
@@ -538,7 +548,6 @@ struct kvm_pmc {
*/
u64 emulated_counter;
u64 eventsel;
- u64 eventsel_hw;
struct perf_event *perf_event;
struct kvm_vcpu *vcpu;
/*
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 7b2b4ce6bdad9..5128858610d83 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -536,10 +536,9 @@ static void kvm_mediated_pmu_refresh_event_filter(struct kvm_pmc *pmc)
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
if (pmc_is_gp(pmc)) {
- pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
- if (allowed)
- pmc->eventsel_hw |= pmc->eventsel &
- ARCH_PERFMON_EVENTSEL_ENABLE;
+ pmc->eventsel_hw_disable_reasons &= ~KVM_PMC_DISABLE_FILTERED;
+ if (!allowed)
+ pmc->eventsel_hw_disable_reasons |= KVM_PMC_DISABLE_FILTERED;
} else {
u64 mask = intel_fixed_bits_by_idx(pmc->idx - KVM_FIXED_PMC_BASE_IDX, 0xf);
@@ -630,7 +629,7 @@ void kvm_pmu_recalc_pmc_emulation(struct kvm_pmu *pmu, struct kvm_pmc *pmc)
* omitting a PMC from a bitmap could result in a missed event if the
* filter is changed to allow counting the event.
*/
- if (!pmc_is_locally_enabled(pmc))
+ if (!pmc_is_locally_enabled(pmc) || pmc_is_nested_disabled(pmc))
return;
if (pmc_is_event_match(pmc, kvm_pmu_eventsel.INSTRUCTIONS_RETIRED))
@@ -944,7 +943,7 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu)
if (pmc_is_gp(pmc)) {
pmc->eventsel = 0;
- pmc->eventsel_hw = 0;
+ pmc->eventsel_hw_disable_reasons = 0;
}
}
@@ -1313,6 +1312,19 @@ static __always_inline u32 gp_eventsel_msr(u32 idx)
return kvm_pmu_ops.GP_EVENTSEL_BASE + idx * kvm_pmu_ops.MSR_STRIDE;
}
+static __always_inline u64 gp_calc_eventsel_hw(struct kvm_pmc *pmc)
+{
+ u64 eventsel = pmc->eventsel;
+
+ if (pmc->eventsel_hw_disable_reasons)
+ eventsel &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
+
+ eventsel |= kvm_pmu_ops.GP_EVENTSEL_ALWAYS_SET;
+ eventsel &= ~kvm_pmu_ops.GP_EVENTSEL_ALWAYS_CLR;
+
+ return eventsel;
+}
+
static void kvm_pmu_load_guest_pmcs(struct kvm_vcpu *vcpu)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -1329,7 +1341,7 @@ static void kvm_pmu_load_guest_pmcs(struct kvm_vcpu *vcpu)
if (pmc->counter != rdpmc(i))
wrmsrl(gp_counter_msr(i), pmc->counter);
- wrmsrl(gp_eventsel_msr(i), pmc->eventsel_hw);
+ wrmsrl(gp_eventsel_msr(i), gp_calc_eventsel_hw(pmc));
}
for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
pmc = &pmu->fixed_counters[i];
@@ -1387,7 +1399,7 @@ static void kvm_pmu_put_guest_pmcs(struct kvm_vcpu *vcpu)
pmc->counter = rdpmc(i);
if (pmc->counter)
wrmsrq(gp_counter_msr(i), 0);
- if (pmc->eventsel_hw)
+ if (gp_calc_eventsel_hw(pmc))
wrmsrq(gp_eventsel_msr(i), 0);
}
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 4a9148cf779df..fcfce6a213aef 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -52,6 +52,9 @@ struct kvm_pmu_ops {
const u32 GP_COUNTER_BASE;
const u32 FIXED_COUNTER_BASE;
const u32 MSR_STRIDE;
+
+ const u64 GP_EVENTSEL_ALWAYS_SET;
+ const u64 GP_EVENTSEL_ALWAYS_CLR;
};
extern bool enable_pmu;
@@ -197,6 +200,11 @@ static inline bool pmc_is_locally_enabled(struct kvm_pmc *pmc)
return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
}
+static inline bool pmc_is_nested_disabled(struct kvm_pmc *pmc)
+{
+ return pmc->eventsel_hw_disable_reasons & KVM_PMC_DISABLE_NESTED;
+}
+
extern struct x86_pmu_capability kvm_pmu_cap;
void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index b12c35b4fccbf..2ac00e729d04b 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -166,8 +166,6 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
data &= ~pmu->reserved_bits;
if (data != pmc->eventsel) {
pmc->eventsel = data;
- pmc->eventsel_hw = (data & ~AMD64_EVENTSEL_HOSTONLY) |
- AMD64_EVENTSEL_GUESTONLY;
kvm_pmu_request_counter_reprogram(pmc);
}
return 0;
@@ -270,6 +268,8 @@ static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
struct vcpu_svm *svm = to_svm(vcpu);
u64 host_guest_bits;
+ pmc->eventsel_hw_disable_reasons &= ~KVM_PMC_DISABLE_NESTED;
+
if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
return;
@@ -296,7 +296,7 @@ static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
return;
}
- pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
+ pmc->eventsel_hw_disable_reasons |= KVM_PMC_DISABLE_NESTED;
}
static void amd_pmu_reprogram_counters(struct kvm_vcpu *vcpu, u64 counters)
@@ -336,4 +336,7 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = {
.GP_COUNTER_BASE = MSR_F15H_PERF_CTR0,
.FIXED_COUNTER_BASE = 0,
.MSR_STRIDE = 2,
+
+ .GP_EVENTSEL_ALWAYS_SET = AMD64_EVENTSEL_GUESTONLY,
+ .GP_EVENTSEL_ALWAYS_CLR = AMD64_EVENTSEL_HOSTONLY,
};
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 9bd77843d8da2..05b68f40f189e 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -431,7 +431,6 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (data != pmc->eventsel) {
pmc->eventsel = data;
- pmc->eventsel_hw = data;
kvm_pmu_request_counter_reprogram(pmc);
}
break;
next prev parent reply other threads:[~2026-05-01 17:50 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 01/13] KVM: nSVM: Stop leaking single-stepping on VMRUN into L2 Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 02/13] KVM: nSVM: Bail early out of VMRUN emulation if advancing RIP fails Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 03/13] KVM: nSVM: Move VMRUN instruction retirement after entering guest mode Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 04/13] KVM: x86: Move enable_pmu/enable_mediated_pmu to pmu.h and pmu.c Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 05/13] KVM: x86/pmu: Rename reprogram_counters() to clarify usage Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 06/13] KVM: x86/pmu: Do a single atomic OR when reprogramming counters Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM Yosry Ahmed
2026-04-30 23:24 ` Yosry Ahmed
2026-05-01 3:34 ` Yosry Ahmed
2026-05-01 17:50 ` Yosry Ahmed [this message]
2026-04-30 20:27 ` [PATCH v5 08/13] KVM: x86/pmu: Reprogram Host/Guest-Only counters on nested transitions Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 09/13] KVM: x86/pmu: Allow Host-Only/Guest-Only bits with nSVM and mediated PMU Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 10/13] KVM: selftests: Refactor allocating guest stack into a helper Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 11/13] KVM: selftests: Allocate a dedicated guest page for x86 L2 guest stack Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 12/13] KVM: selftests: Drop L1-provided stacks for L2 guests on x86 Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 13/13] KVM: selftests: Add svm_pmu_host_guest_test for Host-Only/Guest-Only bits Yosry Ahmed
2026-04-30 20:38 ` [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD " Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afTnlv_cAWFfouqk@google.com \
--to=yosry@kernel.org \
--cc=acme@kernel.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=jmattson@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox