All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosry@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Jim Mattson <jmattson@google.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	 Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	 Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	 kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM
Date: Fri, 1 May 2026 17:50:17 +0000	[thread overview]
Message-ID: <afTnlv_cAWFfouqk@google.com> (raw)
In-Reply-To: <CAO9r8zNM-ECiiSKGS4Z7eM2S_06_6+fDO4mEeKiWFM-NFgHOUg@mail.gmail.com>

On Thu, Apr 30, 2026 at 08:34:59PM -0700, Yosry Ahmed wrote:
> On Thu, Apr 30, 2026 at 4:24 PM Yosry Ahmed <yosry@kernel.org> wrote:
> >
> > > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> > > index 0e99022168a85..0c372b9f8ed34 100644
> > > --- a/arch/x86/kvm/pmu.h
> > > +++ b/arch/x86/kvm/pmu.h
> > > @@ -36,6 +36,7 @@ struct kvm_pmu_ops {
> > >         void (*reset)(struct kvm_vcpu *vcpu);
> > >         void (*deliver_pmi)(struct kvm_vcpu *vcpu);
> > >         void (*cleanup)(struct kvm_vcpu *vcpu);
> > > +       void (*reprogram_counters)(struct kvm_vcpu *vcpu, u64 counters);
> > >
> > >         bool (*is_mediated_pmu_supported)(struct x86_pmu_capability *host_pmu);
> > >         void (*mediated_load)(struct kvm_vcpu *vcpu);
> > > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> > > index 7aa298eeb0721..fe6f2bb79ab83 100644
> > > --- a/arch/x86/kvm/svm/pmu.c
> > > +++ b/arch/x86/kvm/svm/pmu.c
> > > @@ -260,6 +260,48 @@ static void amd_mediated_pmu_put(struct kvm_vcpu *vcpu)
> > >                 wrmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, pmu->global_status);
> > >  }
> > >
> > > +static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
> > > +                                                   struct kvm_pmc *pmc)
> > > +{
> > > +       u64 host_guest_bits;
> > > +
> > > +       if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
> > > +               return;
> > > +
> > > +       /* Count all events if both bits are cleared */
> > > +       host_guest_bits = pmc->eventsel & AMD64_EVENTSEL_HOST_GUEST_MASK;
> > > +       if (!host_guest_bits)
> > > +               return;
> > > +
> > > +       /*
> > > +        * If EFER.SVME is set, the counter is disabledd if only one of the bits
> > > +        * is set and it doesn't match the vCPU context. If EFER.SVME is
> > > +        * cleared, the counter is disable if any of the bits is set.
> > > +        */
> > > +       if (vcpu->arch.efer & EFER_SVME) {
> > > +               if (host_guest_bits == AMD64_EVENTSEL_HOST_GUEST_MASK)
> > > +                       return;
> > > +
> > > +               if (!!(host_guest_bits & AMD64_EVENTSEL_GUESTONLY) == is_guest_mode(vcpu))
> > > +                       return;
> > > +       }
> > > +
> > > +       pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
> >
> > Sashiko raised a good point here. In the following patch, we reprogram
> > the counters synchronously on nested transitions to update the
> > counters' enablement before counting VMRUN or WRMSR(EFER.SVME).
> > However, this updates pmc->eventsel_hw while
> > kvm_pmu_recalc_pmc_emulation() checks pmc->eventsel to check if the
> > counter is enabled.
> >
> > I think either pmc_is_locally_enabled() needs to check
> > pmc->eventsel_hw or we need to update pmc->eventsel here.
> >
> > AFAICT, pmc->eventself has the value written to the MSR, so I think we
> > want to keep that as-is.
> >
> > On the other hand, ARCH_PERFMON_EVENTSEL_ENABLE is cleared in
> > pmc->eventsel_hw in kvm_mediated_pmu_refresh_event_filter() if the
> > event is not allowed, and kvm_pmu_recalc_pmc_emulation() has a comment
> > about intentionally ignoring event filters.
> >
> > We can also separately track enablement for nested purposes, but that
> > would add one more thing we need to check aside from general counter
> > enablement and event filtering.
> >
> > None of these options are ideal, perhaps directly clearing the bit in
> > pmc->eventsel would do the least damage as (pmc->eventsel &
> > ARCH_PERFMON_EVENTSEL_ENABLE) is only checked by
> > pmc_is_locally_enabled().
> 
> No, we can't really clear the bit in pmc->eventsel as that's what the
> guest reads with RDMSR.
> 
> The more I think about this, the more I think we should drop
> pmc->eventsel_hw. It serves two purposes AFAICT:
> 1. On AMD, we use it to clear HOSTONLY and set GUESTONLY in actual HW.
> 2. For event filtering, we use it to clear EVENTSEL_ENABLE.
> 
> But maybe it's easier to explicitly track the changes we need to apply
> to eventsel rather than a HW version?
> 
> (1) is trivial, we can just clear HOSTONLY and set GUESTONLY before
> doing the actual write as that's constant.
> 
> For (2), instead of eventsel_hw, maybe add a boolean called 'filtered'
> to track if that PMC should be filtered out or not. Then, we can add
> another boolean to track if the counter is disabled due to
> nested/HG-bits (e.g. 'nested_disabled').
> 
> With that, pmc_is_locally_enabled() would check:
>   (pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) && !pmc->nested_disabled
> 
> , and then before writing to HW we check pmc->filtered,
> pmc->nested_disabled, as well as do the HOSTONLY/GUESTONLY changes for
> AMD.
> 
> Actually, instead of a boolean, maybe add 'disabled_reasons' flags,
> with possible flags like PMC_DISABLED_FILTER and PMC_DISABLED_NESTED.
> 
> This might all be unclear, I will draft some diff on top of the series
> tomorrow and send it in case it makes things clearer.

This is what I had in mind essentially:

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b191967c9c1e4..9c38c47581e75 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -515,11 +515,21 @@ enum pmc_type {
 	KVM_PMC_FIXED,
 };
 
+#define KVM_PMC_DISABLE_FILTERED	BIT(0)
+#define KVM_PMC_DISABLE_NESTED		BIT(1)
+
 struct kvm_pmc {
 	enum pmc_type type;
 	u8 idx;
 	bool is_paused;
 	bool intr;
+
+	/*
+	 * Reasons why the PMC should be disabled in eventsel when written to HW
+	 * with the mediated vPMU.
+	 */
+	u8 eventsel_hw_disable_reasons;
+
 	/*
 	 * Base value of the PMC counter, relative to the *consumed* count in
 	 * the associated perf_event.  This value includes counter updates from
@@ -538,7 +548,6 @@ struct kvm_pmc {
 	 */
 	u64 emulated_counter;
 	u64 eventsel;
-	u64 eventsel_hw;
 	struct perf_event *perf_event;
 	struct kvm_vcpu *vcpu;
 	/*
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 7b2b4ce6bdad9..5128858610d83 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -536,10 +536,9 @@ static void kvm_mediated_pmu_refresh_event_filter(struct kvm_pmc *pmc)
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
 
 	if (pmc_is_gp(pmc)) {
-		pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
-		if (allowed)
-			pmc->eventsel_hw |= pmc->eventsel &
-					    ARCH_PERFMON_EVENTSEL_ENABLE;
+		pmc->eventsel_hw_disable_reasons &= ~KVM_PMC_DISABLE_FILTERED;
+		if (!allowed)
+			pmc->eventsel_hw_disable_reasons |= KVM_PMC_DISABLE_FILTERED;
 	} else {
 		u64 mask = intel_fixed_bits_by_idx(pmc->idx - KVM_FIXED_PMC_BASE_IDX, 0xf);
 
@@ -630,7 +629,7 @@ void kvm_pmu_recalc_pmc_emulation(struct kvm_pmu *pmu, struct kvm_pmc *pmc)
 	 * omitting a PMC from a bitmap could result in a missed event if the
 	 * filter is changed to allow counting the event.
 	 */
-	if (!pmc_is_locally_enabled(pmc))
+	if (!pmc_is_locally_enabled(pmc) || pmc_is_nested_disabled(pmc))
 		return;
 
 	if (pmc_is_event_match(pmc, kvm_pmu_eventsel.INSTRUCTIONS_RETIRED))
@@ -944,7 +943,7 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu)
 
 		if (pmc_is_gp(pmc)) {
 			pmc->eventsel = 0;
-			pmc->eventsel_hw = 0;
+			pmc->eventsel_hw_disable_reasons = 0;
 		}
 	}
 
@@ -1313,6 +1312,19 @@ static __always_inline u32 gp_eventsel_msr(u32 idx)
 	return kvm_pmu_ops.GP_EVENTSEL_BASE + idx * kvm_pmu_ops.MSR_STRIDE;
 }
 
+static __always_inline u64 gp_calc_eventsel_hw(struct kvm_pmc *pmc)
+{
+	u64 eventsel = pmc->eventsel;
+
+	if (pmc->eventsel_hw_disable_reasons)
+		eventsel &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
+
+	eventsel |= kvm_pmu_ops.GP_EVENTSEL_ALWAYS_SET;
+	eventsel &= ~kvm_pmu_ops.GP_EVENTSEL_ALWAYS_CLR;
+
+	return eventsel;
+}
+
 static void kvm_pmu_load_guest_pmcs(struct kvm_vcpu *vcpu)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -1329,7 +1341,7 @@ static void kvm_pmu_load_guest_pmcs(struct kvm_vcpu *vcpu)
 
 		if (pmc->counter != rdpmc(i))
 			wrmsrl(gp_counter_msr(i), pmc->counter);
-		wrmsrl(gp_eventsel_msr(i), pmc->eventsel_hw);
+		wrmsrl(gp_eventsel_msr(i), gp_calc_eventsel_hw(pmc));
 	}
 	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
 		pmc = &pmu->fixed_counters[i];
@@ -1387,7 +1399,7 @@ static void kvm_pmu_put_guest_pmcs(struct kvm_vcpu *vcpu)
 		pmc->counter = rdpmc(i);
 		if (pmc->counter)
 			wrmsrq(gp_counter_msr(i), 0);
-		if (pmc->eventsel_hw)
+		if (gp_calc_eventsel_hw(pmc))
 			wrmsrq(gp_eventsel_msr(i), 0);
 	}
 
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 4a9148cf779df..fcfce6a213aef 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -52,6 +52,9 @@ struct kvm_pmu_ops {
 	const u32 GP_COUNTER_BASE;
 	const u32 FIXED_COUNTER_BASE;
 	const u32 MSR_STRIDE;
+
+	const u64 GP_EVENTSEL_ALWAYS_SET;
+	const u64 GP_EVENTSEL_ALWAYS_CLR;
 };
 
 extern bool enable_pmu;
@@ -197,6 +200,11 @@ static inline bool pmc_is_locally_enabled(struct kvm_pmc *pmc)
 	return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
 }
 
+static inline bool pmc_is_nested_disabled(struct kvm_pmc *pmc)
+{
+	return pmc->eventsel_hw_disable_reasons & KVM_PMC_DISABLE_NESTED;
+}
+
 extern struct x86_pmu_capability kvm_pmu_cap;
 
 void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index b12c35b4fccbf..2ac00e729d04b 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -166,8 +166,6 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		data &= ~pmu->reserved_bits;
 		if (data != pmc->eventsel) {
 			pmc->eventsel = data;
-			pmc->eventsel_hw = (data & ~AMD64_EVENTSEL_HOSTONLY) |
-					   AMD64_EVENTSEL_GUESTONLY;
 			kvm_pmu_request_counter_reprogram(pmc);
 		}
 		return 0;
@@ -270,6 +268,8 @@ static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
 	struct vcpu_svm *svm = to_svm(vcpu);
 	u64 host_guest_bits;
 
+	pmc->eventsel_hw_disable_reasons &= ~KVM_PMC_DISABLE_NESTED;
+
 	if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
 		return;
 
@@ -296,7 +296,7 @@ static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
 			return;
 	}
 
-	pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
+	pmc->eventsel_hw_disable_reasons |= KVM_PMC_DISABLE_NESTED;
 }
 
 static void amd_pmu_reprogram_counters(struct kvm_vcpu *vcpu, u64 counters)
@@ -336,4 +336,7 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = {
 	.GP_COUNTER_BASE = MSR_F15H_PERF_CTR0,
 	.FIXED_COUNTER_BASE = 0,
 	.MSR_STRIDE = 2,
+
+	.GP_EVENTSEL_ALWAYS_SET = AMD64_EVENTSEL_GUESTONLY,
+	.GP_EVENTSEL_ALWAYS_CLR = AMD64_EVENTSEL_HOSTONLY,
 };
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 9bd77843d8da2..05b68f40f189e 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -431,7 +431,6 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 			if (data != pmc->eventsel) {
 				pmc->eventsel = data;
-				pmc->eventsel_hw = data;
 				kvm_pmu_request_counter_reprogram(pmc);
 			}
 			break;

  reply	other threads:[~2026-05-01 17:50 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 01/13] KVM: nSVM: Stop leaking single-stepping on VMRUN into L2 Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 02/13] KVM: nSVM: Bail early out of VMRUN emulation if advancing RIP fails Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 03/13] KVM: nSVM: Move VMRUN instruction retirement after entering guest mode Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 04/13] KVM: x86: Move enable_pmu/enable_mediated_pmu to pmu.h and pmu.c Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 05/13] KVM: x86/pmu: Rename reprogram_counters() to clarify usage Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 06/13] KVM: x86/pmu: Do a single atomic OR when reprogramming counters Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM Yosry Ahmed
2026-04-30 23:24   ` Yosry Ahmed
2026-05-01  3:34     ` Yosry Ahmed
2026-05-01 17:50       ` Yosry Ahmed [this message]
2026-05-05 18:11         ` Sean Christopherson
2026-05-05 18:23           ` Yosry Ahmed
2026-05-05 18:49             ` Sean Christopherson
2026-05-05 19:32               ` Yosry Ahmed
2026-05-05 19:58                 ` Sean Christopherson
2026-05-05 20:24                   ` Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 08/13] KVM: x86/pmu: Reprogram Host/Guest-Only counters on nested transitions Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 09/13] KVM: x86/pmu: Allow Host-Only/Guest-Only bits with nSVM and mediated PMU Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 10/13] KVM: selftests: Refactor allocating guest stack into a helper Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 11/13] KVM: selftests: Allocate a dedicated guest page for x86 L2 guest stack Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 12/13] KVM: selftests: Drop L1-provided stacks for L2 guests on x86 Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 13/13] KVM: selftests: Add svm_pmu_host_guest_test for Host-Only/Guest-Only bits Yosry Ahmed
2026-04-30 20:38 ` [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD " Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afTnlv_cAWFfouqk@google.com \
    --to=yosry@kernel.org \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.