public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosry@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Jim Mattson <jmattson@google.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	 Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	 Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	 kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM
Date: Fri, 1 May 2026 17:50:17 +0000	[thread overview]
Message-ID: <afTnlv_cAWFfouqk@google.com> (raw)
In-Reply-To: <CAO9r8zNM-ECiiSKGS4Z7eM2S_06_6+fDO4mEeKiWFM-NFgHOUg@mail.gmail.com>

On Thu, Apr 30, 2026 at 08:34:59PM -0700, Yosry Ahmed wrote:
> On Thu, Apr 30, 2026 at 4:24 PM Yosry Ahmed <yosry@kernel.org> wrote:
> >
> > > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> > > index 0e99022168a85..0c372b9f8ed34 100644
> > > --- a/arch/x86/kvm/pmu.h
> > > +++ b/arch/x86/kvm/pmu.h
> > > @@ -36,6 +36,7 @@ struct kvm_pmu_ops {
> > >         void (*reset)(struct kvm_vcpu *vcpu);
> > >         void (*deliver_pmi)(struct kvm_vcpu *vcpu);
> > >         void (*cleanup)(struct kvm_vcpu *vcpu);
> > > +       void (*reprogram_counters)(struct kvm_vcpu *vcpu, u64 counters);
> > >
> > >         bool (*is_mediated_pmu_supported)(struct x86_pmu_capability *host_pmu);
> > >         void (*mediated_load)(struct kvm_vcpu *vcpu);
> > > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> > > index 7aa298eeb0721..fe6f2bb79ab83 100644
> > > --- a/arch/x86/kvm/svm/pmu.c
> > > +++ b/arch/x86/kvm/svm/pmu.c
> > > @@ -260,6 +260,48 @@ static void amd_mediated_pmu_put(struct kvm_vcpu *vcpu)
> > >                 wrmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, pmu->global_status);
> > >  }
> > >
> > > +static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
> > > +                                                   struct kvm_pmc *pmc)
> > > +{
> > > +       u64 host_guest_bits;
> > > +
> > > +       if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
> > > +               return;
> > > +
> > > +       /* Count all events if both bits are cleared */
> > > +       host_guest_bits = pmc->eventsel & AMD64_EVENTSEL_HOST_GUEST_MASK;
> > > +       if (!host_guest_bits)
> > > +               return;
> > > +
> > > +       /*
> > > +        * If EFER.SVME is set, the counter is disabledd if only one of the bits
> > > +        * is set and it doesn't match the vCPU context. If EFER.SVME is
> > > +        * cleared, the counter is disable if any of the bits is set.
> > > +        */
> > > +       if (vcpu->arch.efer & EFER_SVME) {
> > > +               if (host_guest_bits == AMD64_EVENTSEL_HOST_GUEST_MASK)
> > > +                       return;
> > > +
> > > +               if (!!(host_guest_bits & AMD64_EVENTSEL_GUESTONLY) == is_guest_mode(vcpu))
> > > +                       return;
> > > +       }
> > > +
> > > +       pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
> >
> > Sashiko raised a good point here. In the following patch, we reprogram
> > the counters synchronously on nested transitions to update the
> > counters' enablement before counting VMRUN or WRMSR(EFER.SVME).
> > However, this updates pmc->eventsel_hw while
> > kvm_pmu_recalc_pmc_emulation() checks pmc->eventsel to check if the
> > counter is enabled.
> >
> > I think either pmc_is_locally_enabled() needs to check
> > pmc->eventsel_hw or we need to update pmc->eventsel here.
> >
> > AFAICT, pmc->eventself has the value written to the MSR, so I think we
> > want to keep that as-is.
> >
> > On the other hand, ARCH_PERFMON_EVENTSEL_ENABLE is cleared in
> > pmc->eventsel_hw in kvm_mediated_pmu_refresh_event_filter() if the
> > event is not allowed, and kvm_pmu_recalc_pmc_emulation() has a comment
> > about intentionally ignoring event filters.
> >
> > We can also separately track enablement for nested purposes, but that
> > would add one more thing we need to check aside from general counter
> > enablement and event filtering.
> >
> > None of these options are ideal, perhaps directly clearing the bit in
> > pmc->eventsel would do the least damage as (pmc->eventsel &
> > ARCH_PERFMON_EVENTSEL_ENABLE) is only checked by
> > pmc_is_locally_enabled().
> 
> No, we can't really clear the bit in pmc->eventsel as that's what the
> guest reads with RDMSR.
> 
> The more I think about this, the more I think we should drop
> pmc->eventsel_hw. It serves two purposes AFAICT:
> 1. On AMD, we use it to clear HOSTONLY and set GUESTONLY in actual HW.
> 2. For event filtering, we use it to clear EVENTSEL_ENABLE.
> 
> But maybe it's easier to explicitly track the changes we need to apply
> to eventsel rather than a HW version?
> 
> (1) is trivial, we can just clear HOSTONLY and set GUESTONLY before
> doing the actual write as that's constant.
> 
> For (2), instead of eventsel_hw, maybe add a boolean called 'filtered'
> to track if that PMC should be filtered out or not. Then, we can add
> another boolean to track if the counter is disabled due to
> nested/HG-bits (e.g. 'nested_disabled').
> 
> With that, pmc_is_locally_enabled() would check:
>   (pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) && !pmc->nested_disabled
> 
> , and then before writing to HW we check pmc->filtered,
> pmc->nested_disabled, as well as do the HOSTONLY/GUESTONLY changes for
> AMD.
> 
> Actually, instead of a boolean, maybe add 'disabled_reasons' flags,
> with possible flags like PMC_DISABLED_FILTER and PMC_DISABLED_NESTED.
> 
> This might all be unclear, I will draft some diff on top of the series
> tomorrow and send it in case it makes things clearer.

This is what I had in mind essentially:

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b191967c9c1e4..9c38c47581e75 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -515,11 +515,21 @@ enum pmc_type {
 	KVM_PMC_FIXED,
 };
 
+#define KVM_PMC_DISABLE_FILTERED	BIT(0)
+#define KVM_PMC_DISABLE_NESTED		BIT(1)
+
 struct kvm_pmc {
 	enum pmc_type type;
 	u8 idx;
 	bool is_paused;
 	bool intr;
+
+	/*
+	 * Reasons why the PMC should be disabled in eventsel when written to HW
+	 * with the mediated vPMU.
+	 */
+	u8 eventsel_hw_disable_reasons;
+
 	/*
 	 * Base value of the PMC counter, relative to the *consumed* count in
 	 * the associated perf_event.  This value includes counter updates from
@@ -538,7 +548,6 @@ struct kvm_pmc {
 	 */
 	u64 emulated_counter;
 	u64 eventsel;
-	u64 eventsel_hw;
 	struct perf_event *perf_event;
 	struct kvm_vcpu *vcpu;
 	/*
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 7b2b4ce6bdad9..5128858610d83 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -536,10 +536,9 @@ static void kvm_mediated_pmu_refresh_event_filter(struct kvm_pmc *pmc)
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
 
 	if (pmc_is_gp(pmc)) {
-		pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
-		if (allowed)
-			pmc->eventsel_hw |= pmc->eventsel &
-					    ARCH_PERFMON_EVENTSEL_ENABLE;
+		pmc->eventsel_hw_disable_reasons &= ~KVM_PMC_DISABLE_FILTERED;
+		if (!allowed)
+			pmc->eventsel_hw_disable_reasons |= KVM_PMC_DISABLE_FILTERED;
 	} else {
 		u64 mask = intel_fixed_bits_by_idx(pmc->idx - KVM_FIXED_PMC_BASE_IDX, 0xf);
 
@@ -630,7 +629,7 @@ void kvm_pmu_recalc_pmc_emulation(struct kvm_pmu *pmu, struct kvm_pmc *pmc)
 	 * omitting a PMC from a bitmap could result in a missed event if the
 	 * filter is changed to allow counting the event.
 	 */
-	if (!pmc_is_locally_enabled(pmc))
+	if (!pmc_is_locally_enabled(pmc) || pmc_is_nested_disabled(pmc))
 		return;
 
 	if (pmc_is_event_match(pmc, kvm_pmu_eventsel.INSTRUCTIONS_RETIRED))
@@ -944,7 +943,7 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu)
 
 		if (pmc_is_gp(pmc)) {
 			pmc->eventsel = 0;
-			pmc->eventsel_hw = 0;
+			pmc->eventsel_hw_disable_reasons = 0;
 		}
 	}
 
@@ -1313,6 +1312,19 @@ static __always_inline u32 gp_eventsel_msr(u32 idx)
 	return kvm_pmu_ops.GP_EVENTSEL_BASE + idx * kvm_pmu_ops.MSR_STRIDE;
 }
 
+static __always_inline u64 gp_calc_eventsel_hw(struct kvm_pmc *pmc)
+{
+	u64 eventsel = pmc->eventsel;
+
+	if (pmc->eventsel_hw_disable_reasons)
+		eventsel &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
+
+	eventsel |= kvm_pmu_ops.GP_EVENTSEL_ALWAYS_SET;
+	eventsel &= ~kvm_pmu_ops.GP_EVENTSEL_ALWAYS_CLR;
+
+	return eventsel;
+}
+
 static void kvm_pmu_load_guest_pmcs(struct kvm_vcpu *vcpu)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -1329,7 +1341,7 @@ static void kvm_pmu_load_guest_pmcs(struct kvm_vcpu *vcpu)
 
 		if (pmc->counter != rdpmc(i))
 			wrmsrl(gp_counter_msr(i), pmc->counter);
-		wrmsrl(gp_eventsel_msr(i), pmc->eventsel_hw);
+		wrmsrl(gp_eventsel_msr(i), gp_calc_eventsel_hw(pmc));
 	}
 	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
 		pmc = &pmu->fixed_counters[i];
@@ -1387,7 +1399,7 @@ static void kvm_pmu_put_guest_pmcs(struct kvm_vcpu *vcpu)
 		pmc->counter = rdpmc(i);
 		if (pmc->counter)
 			wrmsrq(gp_counter_msr(i), 0);
-		if (pmc->eventsel_hw)
+		if (gp_calc_eventsel_hw(pmc))
 			wrmsrq(gp_eventsel_msr(i), 0);
 	}
 
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 4a9148cf779df..fcfce6a213aef 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -52,6 +52,9 @@ struct kvm_pmu_ops {
 	const u32 GP_COUNTER_BASE;
 	const u32 FIXED_COUNTER_BASE;
 	const u32 MSR_STRIDE;
+
+	const u64 GP_EVENTSEL_ALWAYS_SET;
+	const u64 GP_EVENTSEL_ALWAYS_CLR;
 };
 
 extern bool enable_pmu;
@@ -197,6 +200,11 @@ static inline bool pmc_is_locally_enabled(struct kvm_pmc *pmc)
 	return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
 }
 
+static inline bool pmc_is_nested_disabled(struct kvm_pmc *pmc)
+{
+	return pmc->eventsel_hw_disable_reasons & KVM_PMC_DISABLE_NESTED;
+}
+
 extern struct x86_pmu_capability kvm_pmu_cap;
 
 void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index b12c35b4fccbf..2ac00e729d04b 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -166,8 +166,6 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		data &= ~pmu->reserved_bits;
 		if (data != pmc->eventsel) {
 			pmc->eventsel = data;
-			pmc->eventsel_hw = (data & ~AMD64_EVENTSEL_HOSTONLY) |
-					   AMD64_EVENTSEL_GUESTONLY;
 			kvm_pmu_request_counter_reprogram(pmc);
 		}
 		return 0;
@@ -270,6 +268,8 @@ static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
 	struct vcpu_svm *svm = to_svm(vcpu);
 	u64 host_guest_bits;
 
+	pmc->eventsel_hw_disable_reasons &= ~KVM_PMC_DISABLE_NESTED;
+
 	if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
 		return;
 
@@ -296,7 +296,7 @@ static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
 			return;
 	}
 
-	pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
+	pmc->eventsel_hw_disable_reasons |= KVM_PMC_DISABLE_NESTED;
 }
 
 static void amd_pmu_reprogram_counters(struct kvm_vcpu *vcpu, u64 counters)
@@ -336,4 +336,7 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = {
 	.GP_COUNTER_BASE = MSR_F15H_PERF_CTR0,
 	.FIXED_COUNTER_BASE = 0,
 	.MSR_STRIDE = 2,
+
+	.GP_EVENTSEL_ALWAYS_SET = AMD64_EVENTSEL_GUESTONLY,
+	.GP_EVENTSEL_ALWAYS_CLR = AMD64_EVENTSEL_HOSTONLY,
 };
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 9bd77843d8da2..05b68f40f189e 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -431,7 +431,6 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 			if (data != pmc->eventsel) {
 				pmc->eventsel = data;
-				pmc->eventsel_hw = data;
 				kvm_pmu_request_counter_reprogram(pmc);
 			}
 			break;

  reply	other threads:[~2026-05-01 17:50 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 01/13] KVM: nSVM: Stop leaking single-stepping on VMRUN into L2 Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 02/13] KVM: nSVM: Bail early out of VMRUN emulation if advancing RIP fails Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 03/13] KVM: nSVM: Move VMRUN instruction retirement after entering guest mode Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 04/13] KVM: x86: Move enable_pmu/enable_mediated_pmu to pmu.h and pmu.c Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 05/13] KVM: x86/pmu: Rename reprogram_counters() to clarify usage Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 06/13] KVM: x86/pmu: Do a single atomic OR when reprogramming counters Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM Yosry Ahmed
2026-04-30 23:24   ` Yosry Ahmed
2026-05-01  3:34     ` Yosry Ahmed
2026-05-01 17:50       ` Yosry Ahmed [this message]
2026-05-05 18:11         ` Sean Christopherson
2026-05-05 18:23           ` Yosry Ahmed
2026-05-05 18:49             ` Sean Christopherson
2026-05-05 19:32               ` Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 08/13] KVM: x86/pmu: Reprogram Host/Guest-Only counters on nested transitions Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 09/13] KVM: x86/pmu: Allow Host-Only/Guest-Only bits with nSVM and mediated PMU Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 10/13] KVM: selftests: Refactor allocating guest stack into a helper Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 11/13] KVM: selftests: Allocate a dedicated guest page for x86 L2 guest stack Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 12/13] KVM: selftests: Drop L1-provided stacks for L2 guests on x86 Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 13/13] KVM: selftests: Add svm_pmu_host_guest_test for Host-Only/Guest-Only bits Yosry Ahmed
2026-04-30 20:38 ` [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD " Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afTnlv_cAWFfouqk@google.com \
    --to=yosry@kernel.org \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox