[PATCH v5 08/13] KVM: x86/pmu: Reprogram Host/Guest-Only counters on nested transitions

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Yosry Ahmed <yosry@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Jim Mattson <jmattson@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Yosry Ahmed <yosry@kernel.org>
Subject: [PATCH v5 08/13] KVM: x86/pmu: Reprogram Host/Guest-Only counters on nested transitions
Date: Thu, 30 Apr 2026 20:27:45 +0000	[thread overview]
Message-ID: <20260430202750.3924147-9-yosry@kernel.org> (raw)
In-Reply-To: <20260430202750.3924147-1-yosry@kernel.org>

Reprogram PMU counters on nested transitions for the mediated PMU, to
re-evaluate Host-Only and Guest-Only bits and enable/disable the PMU
counters accordingly. For example, if Host-Only is set and Guest-Only is
cleared, a counter should be disabled when entering guest mode and
enabled when exiting guest mode.

According to the APM, when EFER.SVME is cleared, setting Host-Only or
Guest-Only disables the counter, so also trigger counter reprogramming
when EFER.SVME is toggled.

Track counters with any of Host-Only and Guest-Only set as counters
requiring reprogramming on nested transitions in a bitmap. Track such
counters even if EFER.SVME is cleared as counters with Host-Only or
Guest-Only bits set need to be reprogrammed on EFER.SVME toggling.

Reprogram the counters synchronously on nested VMRUN/#VMEXIT and
EFER.SVME toggling. This is necessary as these instructions are counted
based on the new CPU state (after the instruction is retired in
hardware).  Hence, the PMU needs to be updated before instruction
emulation is completed and kvm_pmu_instruction_retired() is called.

Defer reprogramming the counters when force leaving guest mode through
svm_leave_nested() to avoid potentially reading stale state (e.g.
incorrect EFER). All flows force leaving nested are not architectural,
so precision is not a priority.

Refactor a helper out of kvm_pmu_request_reprogram_counters() that
accepts a boolean allowing synchronous vs deferred reprogramming, and
use that from SVM code to support both scenarios.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/pmu.c        |  1 +
 arch/x86/kvm/pmu.h        | 18 ++++++++++++++----
 arch/x86/kvm/svm/nested.c | 12 ++++++++++++
 arch/x86/kvm/svm/pmu.c    | 12 +++++++++---
 arch/x86/kvm/svm/svm.c    |  2 ++
 arch/x86/kvm/svm/svm.h    | 33 +++++++++++++++++++++++++++++++++
 6 files changed, 71 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 5e3a10e0a54ff..7b2b4ce6bdad9 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -684,6 +684,7 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
 	kvm_for_each_pmc(pmu, pmc, bit, bitmap)
 		kvm_pmu_recalc_pmc_emulation(pmu, pmc);
 }
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_pmu_handle_event);
 
 int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx)
 {
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 0c372b9f8ed34..4a9148cf779df 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -202,6 +202,7 @@ extern struct x86_pmu_capability kvm_pmu_cap;
 void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops);
 
 void kvm_pmu_recalc_pmc_emulation(struct kvm_pmu *pmu, struct kvm_pmc *pmc);
+void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
 
 static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
 {
@@ -211,14 +212,24 @@ static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
 	kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
 }
 
-static inline void kvm_pmu_request_counters_reprogram(struct kvm_pmu *pmu,
-						      u64 counters)
+static inline void __kvm_pmu_reprogram_counters(struct kvm_pmu *pmu,
+						u64 counters,
+						bool defer)
 {
 	if (!counters)
 		return;
 
 	atomic64_or(counters, &pmu->__reprogram_pmi);
-	kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
+	if (defer)
+		kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
+	else
+		kvm_pmu_handle_event(pmu_to_vcpu(pmu));
+}
+
+static inline void kvm_pmu_request_counters_reprogram(struct kvm_pmu *pmu,
+						      u64 counters)
+{
+	__kvm_pmu_reprogram_counters(pmu, counters, true);
 }
 
 /*
@@ -247,7 +258,6 @@ static inline bool kvm_pmu_is_fastpath_emulation_allowed(struct kvm_vcpu *vcpu)
 }
 
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
-void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
 int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
 int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx);
 bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 58c78c889a812..bb3362c043395 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -826,6 +826,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 
 	/* Enter Guest-Mode */
 	enter_guest_mode(vcpu);
+	svm_pmu_handle_nested_transition(svm);
 
 	/*
 	 * Filled at exit: exit_code, exit_info_1, exit_info_2, exit_int_info,
@@ -1302,6 +1303,8 @@ void nested_svm_vmexit(struct vcpu_svm *svm)
 
 	/* Exit Guest-Mode */
 	leave_guest_mode(vcpu);
+	svm_pmu_handle_nested_transition(svm);
+
 	svm->nested.vmcb12_gpa = 0;
 
 	kvm_warn_on_nested_run_pending(vcpu);
@@ -1519,6 +1522,15 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
 
 		leave_guest_mode(vcpu);
 
+		/*
+		 * Force leaving nested is a non-architectural flow so precision
+		 * is not a priority. Defer updating the PMU until the next vCPU
+		 * run, potentially tolerating some imprecision to avoid poking
+		 * into PMU state from arbitrary contexts (e.g.  KVM may end up
+		 * using stale state).
+		 */
+		__svm_pmu_handle_nested_transition(svm, true);
+
 		svm_switch_vmcb(svm, &svm->vmcb01);
 
 		nested_svm_uninit_mmu_context(vcpu);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index fe6f2bb79ab83..902d7eb4a461b 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -263,20 +263,26 @@ static void amd_mediated_pmu_put(struct kvm_vcpu *vcpu)
 static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
 						    struct kvm_pmc *pmc)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
 	u64 host_guest_bits;
 
 	if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
 		return;
 
-	/* Count all events if both bits are cleared */
+	/*
+	 * If both bits are cleared, always keep the counter enabled. Otherwise,
+	 * counter enablement needs to be re-evaluated on every nested
+	 * transition (and EFER.SVME change).
+	 */
 	host_guest_bits = pmc->eventsel & AMD64_EVENTSEL_HOST_GUEST_MASK;
 	if (!host_guest_bits)
 		return;
+	__set_bit(pmc->idx, svm->nested.reprogram_pmcs_on_nested_transitions);
 
 	/*
-	 * If EFER.SVME is set, the counter is disabledd if only one of the bits
+	 * If EFER.SVME is set, the counter is disabled if only one of the bits
 	 * is set and it doesn't match the vCPU context. If EFER.SVME is
-	 * cleared, the counter is disable if any of the bits is set.
+	 * cleared, the counter is disabled if any of the bits is set.
 	 */
 	if (vcpu->arch.efer & EFER_SVME) {
 		if (host_guest_bits == AMD64_EVENTSEL_HOST_GUEST_MASK)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e7fdd7a9c280d..7ffa3c9033d0f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -261,6 +261,7 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 				set_exception_intercept(svm, GP_VECTOR);
 		}
 
+		svm_pmu_handle_nested_transition(svm);
 		kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu);
 	}
 
@@ -1214,6 +1215,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
 
 	svm->nested.vmcb12_gpa = INVALID_GPA;
 	svm->nested.last_vmcb12_gpa = INVALID_GPA;
+	bitmap_zero(svm->nested.reprogram_pmcs_on_nested_transitions, X86_PMC_IDX_MAX);
 
 	if (!kvm_pause_in_guest(vcpu->kvm)) {
 		control->pause_filter_count = pause_filter_count;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a10668d17a16a..8709e87621d21 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -24,6 +24,7 @@
 
 #include "cpuid.h"
 #include "kvm_cache_regs.h"
+#include "pmu.h"
 
 /*
  * Helpers to convert to/from physical addresses for pages whose address is
@@ -238,6 +239,13 @@ struct svm_nested_state {
 	 * on its side.
 	 */
 	bool force_msr_bitmap_recalc;
+
+	/*
+	 * PMU counters where Host-Only or Guest-Only bits are used need to be
+	 * reprogrammed on nested transitions and EFER.SVME changes to correctly
+	 * enable/disable the counters based on the vCPU state.
+	 */
+	DECLARE_BITMAP(reprogram_pmcs_on_nested_transitions, X86_PMC_IDX_MAX);
 };
 
 struct vcpu_sev_es_state {
@@ -877,6 +885,31 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm);
 void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm);
 void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);
 
+
+static inline void __svm_pmu_handle_nested_transition(struct vcpu_svm *svm, bool defer)
+{
+	u64 counters = *(u64 *)svm->nested.reprogram_pmcs_on_nested_transitions;
+
+	if (!counters)
+		return;
+
+	/* Reprogramming sets the bit again for PMCs that still need tracking */
+	bitmap_zero(svm->nested.reprogram_pmcs_on_nested_transitions, X86_PMC_IDX_MAX);
+	__kvm_pmu_reprogram_counters(vcpu_to_pmu(&svm->vcpu), counters, defer);
+}
+
+static inline void svm_pmu_handle_nested_transition(struct vcpu_svm *svm)
+{
+	/*
+	 * Do NOT defer reprogramming the counters by default.  Instructions
+	 * causing a state change are counted based on the _new_ CPU state
+	 * (e.g. a successful VMRUN is counted in guest mode). Hence, the
+	 * counters should be reprogrammed with the new state _before_ the
+	 * instruction is potentially counted upon emulation completion.
+	 */
+	__svm_pmu_handle_nested_transition(svm, false);
+}
+
 extern struct kvm_x86_nested_ops svm_nested_ops;
 
 /* avic.c */
-- 
2.54.0.545.g6539524ca2-goog

next prev parent reply	other threads:[~2026-04-30 20:28 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 01/13] KVM: nSVM: Stop leaking single-stepping on VMRUN into L2 Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 02/13] KVM: nSVM: Bail early out of VMRUN emulation if advancing RIP fails Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 03/13] KVM: nSVM: Move VMRUN instruction retirement after entering guest mode Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 04/13] KVM: x86: Move enable_pmu/enable_mediated_pmu to pmu.h and pmu.c Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 05/13] KVM: x86/pmu: Rename reprogram_counters() to clarify usage Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 06/13] KVM: x86/pmu: Do a single atomic OR when reprogramming counters Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM Yosry Ahmed
2026-04-30 23:24   ` Yosry Ahmed
2026-05-01  3:34     ` Yosry Ahmed
2026-05-01 17:50       ` Yosry Ahmed
2026-04-30 20:27 ` Yosry Ahmed [this message]
2026-04-30 20:27 ` [PATCH v5 09/13] KVM: x86/pmu: Allow Host-Only/Guest-Only bits with nSVM and mediated PMU Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 10/13] KVM: selftests: Refactor allocating guest stack into a helper Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 11/13] KVM: selftests: Allocate a dedicated guest page for x86 L2 guest stack Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 12/13] KVM: selftests: Drop L1-provided stacks for L2 guests on x86 Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 13/13] KVM: selftests: Add svm_pmu_host_guest_test for Host-Only/Guest-Only bits Yosry Ahmed
2026-04-30 20:38 ` [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD " Yosry Ahmed

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:5e3a10e0a54f dfblob:7b2b4ce6bdad dfblob:0c372b9f8ed3
dfblob:4a9148cf779d dfblob:58c78c889a81 dfblob:bb3362c04339
dfblob:fe6f2bb79ab8 dfblob:902d7eb4a461 dfblob:e7fdd7a9c280
dfblob:7ffa3c9033d0 dfblob:a10668d17a16 dfblob:8709e87621d2 )
 OR (
bs:"[PATCH v5 08/13] KVM: x86/pmu: Reprogram Host/Guest-Only counters on nested transitions" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260430202750.3924147-9-yosry@kernel.org \
    --to=yosry@kernel.org \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox