public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] perf/x86: Don't write PEBS_ENABLED on KVM transitions
@ 2026-04-14 19:14 Sean Christopherson
  2026-04-14 19:14 ` [PATCH 1/4] perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU has isolation Sean Christopherson
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Sean Christopherson @ 2026-04-14 19:14 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	Sean Christopherson, Paolo Bonzini
  Cc: linux-perf-users, linux-kernel, kvm, Jim Mattson, Mingwei Zhang,
	Stephane Eranian, Dapeng Mi

Rework the handling of PEBS_ENABLED (and related PEBS MSRs) to *never* touch
PEBS_ENABLED if the CPU provides PEBS isolation, in which case disabling
counters via PERF_GLOBAL_CTRL is sufficient to prevent generation of unwanted
PEBS records.  For vCPUs without PEBS enabled, this saves upwards of 6 MSR
writes on each roundtrip between the guest and host.  For vCPUS with PEBS,
this saves 2 MSR writes per roundtrip.

However, performance isn't the underlying motiviation.  We (more accurately,
Jim, Mingwei, and Stephane) have been chasing issues where PEBS_ENABLED bits
can get "stuck" in a '1' state when running KVM guests while profiling the host
with PEBS events.  The working theory is that perf throttles PEBS events in
NMI context, and thus clears bits in cpuc->pebs_enabled and PEBS_ENABLED, after
generating the list of PMU MSRs to context switch but before VM-Entry.  And so
when the host's PEBS_ENABLED is loaded on VM-Exit, the CPU ends up with a
stale PEBS_ENABLED that doesn't get reset until something triggers an explicit
reload in perf.

Testing this against our "PEBS_ENABLED is stuck" reproducer is a work
in-progress (largely because the "reproducer" is currently "throw the kernel in
a big test pool"), i.e. I don't know if this actually resolves the problems we
are seeing.  But even if it doesn't fully resolve our woes, it seems like a
no-brainer improvement, and if we're missing something with respect to "stuck"
PEBS_ENABLED, it'd be nice to get feedback/input asap.

Note, if the throttling theory is correct, then there are likely more fixes
that need to be done, e.g. for CPUs without isolation, and/or if
PERF_GLOBAL_CTRL can be modified from NMI context too.

Patch 4 is a clean up that I posted as a standalone patch almost a year ago.
I included it here because it's very related, and because I needed to refresh
it anyways.

Sean Christopherson (4):
  perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU
    has isolation
  perf/x86/intel: Don't context switch DS_AREA (and PEBS config) if PEBS
    is unused
  perf/x86/intel: Make @data a mandatory param for
    intel_guest_get_msrs()
  perf/x86: KVM: Have perf define a dedicated struct for getting guest
    PEBS data

 arch/x86/events/core.c            |  5 ++-
 arch/x86/events/intel/core.c      | 71 +++++++++++++++++++------------
 arch/x86/events/perf_event.h      |  3 +-
 arch/x86/include/asm/kvm_host.h   |  9 ----
 arch/x86/include/asm/perf_event.h | 12 +++++-
 arch/x86/kvm/vmx/pmu_intel.c      | 20 +++++++--
 arch/x86/kvm/vmx/vmx.c            | 11 +++--
 arch/x86/kvm/vmx/vmx.h            |  2 +-
 8 files changed, 84 insertions(+), 49 deletions(-)


base-commit: 6b802031877a995456c528095c41d1948546bf45
-- 
2.54.0.rc0.605.g598a273b03-goog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU has isolation
  2026-04-14 19:14 [PATCH 0/4] perf/x86: Don't write PEBS_ENABLED on KVM transitions Sean Christopherson
@ 2026-04-14 19:14 ` Sean Christopherson
  2026-04-16 18:24   ` Namhyung Kim
  2026-04-14 19:14 ` [PATCH 2/4] perf/x86/intel: Don't context switch DS_AREA (and PEBS config) if PEBS is unused Sean Christopherson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2026-04-14 19:14 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	Sean Christopherson, Paolo Bonzini
  Cc: linux-perf-users, linux-kernel, kvm, Jim Mattson, Mingwei Zhang,
	Stephane Eranian, Dapeng Mi

When filling the list of MSRs to be loaded by KVM on VM-Enter and VM-Exit,
*never* insert an entry for PEBS_ENABLED if the CPU properly isolates PEBS
events, in which case disabling counters via PERF_GLOBAL_CTRL is sufficient
to prevent unwanted PEBS events in the guest (or host).  Because perf loads
PEBS_ENABLE with the unfiltered cpu_hw_events.pebs_enabled, i.e. with both
host and guest masks, there is no need to load different values for the
guest versus host, perf+KVM can and should simply control which counters
are enabled/disabled via PERF_GLOBAL_CTRL.

Avoiding touching PEBS_ENABLED fixes a bug where PEBS_ENABLED can end up
with "stuck" bits if a PEBS event is throttled better generating the list
and actually entering the guest (Intel CPUs can't arbtitrarily block NMIs).
And stating the obvious, leaving PEBS_ENABLED as-is avoids two MSR writes
on every VMX transition.

Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
Cc: Jim Mattson <jmattson@google.com>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/events/intel/core.c | 42 ++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 793335c3ce78..002d809f82ef 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4999,12 +4999,15 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 	struct kvm_pmu *kvm_pmu = (struct kvm_pmu *)data;
 	u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
 	u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable;
-	int global_ctrl, pebs_enable;
+	u64 guest_pebs_mask = pebs_mask & ~cpuc->intel_ctrl_host_mask;
+	int global_ctrl;
 
 	/*
 	 * In addition to obeying exclude_guest/exclude_host, remove bits being
 	 * used for PEBS when running a guest, because PEBS writes to virtual
-	 * addresses (not physical addresses).
+	 * addresses (not physical addresses).  If the guest wants to utilize
+	 * PEBS, and PEBS can safely enabled in the guest, bits for the guest's
+	 * PEBS-enabled counters will be OR'd back in as appropriate.
 	 */
 	*nr = 0;
 	global_ctrl = (*nr)++;
@@ -5051,24 +5054,25 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 		};
 	}
 
-	pebs_enable = (*nr)++;
-	arr[pebs_enable] = (struct perf_guest_switch_msr){
-		.msr = MSR_IA32_PEBS_ENABLE,
-		.host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,
-		.guest = pebs_mask & ~cpuc->intel_ctrl_host_mask & kvm_pmu->pebs_enable,
-	};
-
-	if (arr[pebs_enable].host) {
-		/* Disable guest PEBS if host PEBS is enabled. */
-		arr[pebs_enable].guest = 0;
-	} else {
-		/* Disable guest PEBS thoroughly for cross-mapped PEBS counters. */
-		arr[pebs_enable].guest &= ~kvm_pmu->host_cross_mapped_mask;
-		arr[global_ctrl].guest &= ~kvm_pmu->host_cross_mapped_mask;
-		/* Set hw GLOBAL_CTRL bits for PEBS counter when it runs for guest */
-		arr[global_ctrl].guest |= arr[pebs_enable].guest;
-	}
+	/*
+	 * Disable counters where the guest PMC is different than the host PMC
+	 * being used on behalf of the guest, as the PEBS record includes
+	 * PERF_GLOBAL_STATUS, i.e. the guest will see overflow status for the
+	 * wrong counter(s).  Similarly, disallow PEBS in the guest if the host
+	 * is using PEBS, to avoid bleeding host state into PEBS records.
+	 */
+	guest_pebs_mask &= kvm_pmu->pebs_enable & ~kvm_pmu->host_cross_mapped_mask;
+	if (pebs_mask & ~cpuc->intel_ctrl_guest_mask)
+		guest_pebs_mask = 0;
 
+	/*
+	 * Do NOT mess with PEBS_ENABLED.  As above, disabling counters via
+	 * PERF_GLOBAL_CTRL is sufficient, and loading a stale PEBS_ENABLED,
+	 * e.g. on VM-Exit, can put the system in a bad state.  Simply enable
+	 * counters in PERF_GLOBAL_CTRL, as perf load PEBS_ENABLED with the
+	 * full value, i.e. perf *also* relies on PERF_GLOBAL_CTRL.
+	 */
+	arr[global_ctrl].guest |= guest_pebs_mask;
 	return arr;
 }
 
-- 
2.54.0.rc0.605.g598a273b03-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] perf/x86/intel: Don't context switch DS_AREA (and PEBS config) if PEBS is unused
  2026-04-14 19:14 [PATCH 0/4] perf/x86: Don't write PEBS_ENABLED on KVM transitions Sean Christopherson
  2026-04-14 19:14 ` [PATCH 1/4] perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU has isolation Sean Christopherson
@ 2026-04-14 19:14 ` Sean Christopherson
  2026-04-14 21:31   ` Jim Mattson
  2026-04-14 19:14 ` [PATCH 3/4] perf/x86/intel: Make @data a mandatory param for intel_guest_get_msrs() Sean Christopherson
  2026-04-14 19:14 ` [PATCH 4/4] perf/x86: KVM: Have perf define a dedicated struct for getting guest PEBS data Sean Christopherson
  3 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2026-04-14 19:14 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	Sean Christopherson, Paolo Bonzini
  Cc: linux-perf-users, linux-kernel, kvm, Jim Mattson, Mingwei Zhang,
	Stephane Eranian, Dapeng Mi

When filling the list of MSRs to be loaded by KVM on VM-Enter and VM-Exit,
insert DS_AREA and (conditionally) MSR_PEBS_DATA_CFG into the list if and
only if PEBS will be active in the guest, i.e. only if a PEBS record may be
generated while running the guest.  As shown by the !x86_pmu.pebs_ept path,
it's perfectly safe to run with the host's DS_AREA, so long as PEBS-enabled
counters are disabled via PERF_GLOBAL_CTRL.

Omitting DS_AREA and MSR_PEBS_DATA_CFG when PEBS is unused saves two MSR
writes per MSR on each VMX transition, i.e. eliminates two/four pointless
MSR writes on each VMX roundtrip when PEBS isn't being used by the guest.

Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
Cc: Jim Mattson <jmattson@google.com>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/events/intel/core.c | 41 ++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 002d809f82ef..20a153aa33cb 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5037,23 +5037,14 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 		return arr;
 	}
 
+	/*
+	 * If the guest won't use PEBS or the CPU doesn't support PEBS in the
+	 * guest, then there's nothing more to do as disabling PMCs via
+	 * PERF_GLOBAL_CTRL is sufficient on CPUs with guest/host isolation.
+	 */
 	if (!kvm_pmu || !x86_pmu.pebs_ept)
 		return arr;
 
-	arr[(*nr)++] = (struct perf_guest_switch_msr){
-		.msr = MSR_IA32_DS_AREA,
-		.host = (unsigned long)cpuc->ds,
-		.guest = kvm_pmu->ds_area,
-	};
-
-	if (x86_pmu.intel_cap.pebs_baseline) {
-		arr[(*nr)++] = (struct perf_guest_switch_msr){
-			.msr = MSR_PEBS_DATA_CFG,
-			.host = cpuc->active_pebs_data_cfg,
-			.guest = kvm_pmu->pebs_data_cfg,
-		};
-	}
-
 	/*
 	 * Disable counters where the guest PMC is different than the host PMC
 	 * being used on behalf of the guest, as the PEBS record includes
@@ -5065,6 +5056,28 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 	if (pebs_mask & ~cpuc->intel_ctrl_guest_mask)
 		guest_pebs_mask = 0;
 
+	/*
+	 * Context switch DS_AREA and PEBS_DATA_CFG if and only if PEBS will be
+	 * active in the guest; if no records will be generated while the guest
+	 * is running, then running with host values is safe (see above).
+	 */
+	if (!guest_pebs_mask)
+		return arr;
+
+	arr[(*nr)++] = (struct perf_guest_switch_msr){
+		.msr = MSR_IA32_DS_AREA,
+		.host = (unsigned long)cpuc->ds,
+		.guest = kvm_pmu->ds_area,
+	};
+
+	if (x86_pmu.intel_cap.pebs_baseline) {
+		arr[(*nr)++] = (struct perf_guest_switch_msr){
+			.msr = MSR_PEBS_DATA_CFG,
+			.host = cpuc->active_pebs_data_cfg,
+			.guest = kvm_pmu->pebs_data_cfg,
+		};
+	}
+
 	/*
 	 * Do NOT mess with PEBS_ENABLED.  As above, disabling counters via
 	 * PERF_GLOBAL_CTRL is sufficient, and loading a stale PEBS_ENABLED,
-- 
2.54.0.rc0.605.g598a273b03-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] perf/x86/intel: Make @data a mandatory param for intel_guest_get_msrs()
  2026-04-14 19:14 [PATCH 0/4] perf/x86: Don't write PEBS_ENABLED on KVM transitions Sean Christopherson
  2026-04-14 19:14 ` [PATCH 1/4] perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU has isolation Sean Christopherson
  2026-04-14 19:14 ` [PATCH 2/4] perf/x86/intel: Don't context switch DS_AREA (and PEBS config) if PEBS is unused Sean Christopherson
@ 2026-04-14 19:14 ` Sean Christopherson
  2026-04-14 22:29   ` Jim Mattson
  2026-04-14 19:14 ` [PATCH 4/4] perf/x86: KVM: Have perf define a dedicated struct for getting guest PEBS data Sean Christopherson
  3 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2026-04-14 19:14 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	Sean Christopherson, Paolo Bonzini
  Cc: linux-perf-users, linux-kernel, kvm, Jim Mattson, Mingwei Zhang,
	Stephane Eranian, Dapeng Mi

Drop "support" for passing a NULL @data/@kvm_pmu param when getting guest
MSRs.  KVM, the only in-tree user, unconditionally passes a non-NULL
pointer, and carrying code that suggests @data may be NULL is confusing,
e.g. incorrectly implies that there are scenarios where KVM doesn't pass
a PMU context.

Fixes: 8183a538cd95 ("KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS")
Cc: Jim Mattson <jmattson@google.com>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/events/intel/core.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 20a153aa33cb..9a78731deea2 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5038,11 +5038,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 	}
 
 	/*
-	 * If the guest won't use PEBS or the CPU doesn't support PEBS in the
-	 * guest, then there's nothing more to do as disabling PMCs via
-	 * PERF_GLOBAL_CTRL is sufficient on CPUs with guest/host isolation.
+	 * If the CPU doesn't support PEBS in the guest, then there's nothing
+	 * more to do as disabling PMCs via PERF_GLOBAL_CTRL is sufficient on
+	 * CPUs with guest/host isolation.
 	 */
-	if (!kvm_pmu || !x86_pmu.pebs_ept)
+	if (!x86_pmu.pebs_ept)
 		return arr;
 
 	/*
-- 
2.54.0.rc0.605.g598a273b03-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] perf/x86: KVM: Have perf define a dedicated struct for getting guest PEBS data
  2026-04-14 19:14 [PATCH 0/4] perf/x86: Don't write PEBS_ENABLED on KVM transitions Sean Christopherson
                   ` (2 preceding siblings ...)
  2026-04-14 19:14 ` [PATCH 3/4] perf/x86/intel: Make @data a mandatory param for intel_guest_get_msrs() Sean Christopherson
@ 2026-04-14 19:14 ` Sean Christopherson
  3 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2026-04-14 19:14 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	Sean Christopherson, Paolo Bonzini
  Cc: linux-perf-users, linux-kernel, kvm, Jim Mattson, Mingwei Zhang,
	Stephane Eranian, Dapeng Mi

Have perf define a struct for getting guest PEBS data from KVM instead of
poking into the kvm_pmu structure.  Passing in an entire "struct kvm_pmu"
_as an opaque pointer_ to get at four fields is silly, especially since
one of the fields exists purely to convey information to perf, i.e. isn't
used by KVM.

Perf should also own its APIs, i.e. define what fields/data it needs, not
rely on KVM to throw fields into data structures that effectively hold
KVM-internal state.

Opportunistically rephrase the comment about cross-mapped counters to
explain *why* PEBS needs to be disabled.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/events/core.c            |  5 +++--
 arch/x86/events/intel/core.c      | 14 +++++++-------
 arch/x86/events/perf_event.h      |  3 ++-
 arch/x86/include/asm/kvm_host.h   |  9 ---------
 arch/x86/include/asm/perf_event.h | 12 ++++++++++--
 arch/x86/kvm/vmx/pmu_intel.c      | 20 +++++++++++++++++---
 arch/x86/kvm/vmx/vmx.c            | 11 +++++++----
 arch/x86/kvm/vmx/vmx.h            |  2 +-
 8 files changed, 47 insertions(+), 29 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 810ab21ffd99..e6f788e72e72 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -723,9 +723,10 @@ void x86_pmu_disable_all(void)
 	}
 }
 
-struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data)
+struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr,
+						  struct x86_guest_pebs *guest_pebs)
 {
-	return static_call(x86_pmu_guest_get_msrs)(nr, data);
+	return static_call(x86_pmu_guest_get_msrs)(nr, guest_pebs);
 }
 EXPORT_SYMBOL_FOR_KVM(perf_guest_get_msrs);
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 9a78731deea2..eded12764775 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -14,7 +14,6 @@
 #include <linux/slab.h>
 #include <linux/export.h>
 #include <linux/nmi.h>
-#include <linux/kvm_host.h>
 
 #include <asm/cpufeature.h>
 #include <asm/debugreg.h>
@@ -4992,11 +4991,11 @@ static int intel_pmu_hw_config(struct perf_event *event)
  * when it uses {RD,WR}MSR, which should be handled by the KVM context,
  * specifically in the intel_pmu_{get,set}_msr().
  */
-static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
+static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr,
+							  struct x86_guest_pebs *guest_pebs)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
-	struct kvm_pmu *kvm_pmu = (struct kvm_pmu *)data;
 	u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
 	u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable;
 	u64 guest_pebs_mask = pebs_mask & ~cpuc->intel_ctrl_host_mask;
@@ -5052,7 +5051,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 	 * wrong counter(s).  Similarly, disallow PEBS in the guest if the host
 	 * is using PEBS, to avoid bleeding host state into PEBS records.
 	 */
-	guest_pebs_mask &= kvm_pmu->pebs_enable & ~kvm_pmu->host_cross_mapped_mask;
+	guest_pebs_mask &= guest_pebs->enable & ~guest_pebs->cross_mapped_mask;
 	if (pebs_mask & ~cpuc->intel_ctrl_guest_mask)
 		guest_pebs_mask = 0;
 
@@ -5067,14 +5066,14 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 	arr[(*nr)++] = (struct perf_guest_switch_msr){
 		.msr = MSR_IA32_DS_AREA,
 		.host = (unsigned long)cpuc->ds,
-		.guest = kvm_pmu->ds_area,
+		.guest = guest_pebs->ds_area,
 	};
 
 	if (x86_pmu.intel_cap.pebs_baseline) {
 		arr[(*nr)++] = (struct perf_guest_switch_msr){
 			.msr = MSR_PEBS_DATA_CFG,
 			.host = cpuc->active_pebs_data_cfg,
-			.guest = kvm_pmu->pebs_data_cfg,
+			.guest = guest_pebs->data_cfg,
 		};
 	}
 
@@ -5089,7 +5088,8 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 	return arr;
 }
 
-static struct perf_guest_switch_msr *core_guest_get_msrs(int *nr, void *data)
+static struct perf_guest_switch_msr *core_guest_get_msrs(int *nr,
+							 struct x86_guest_pebs *guest_pebs)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index fad87d3c8b2c..19d811ca6b05 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1023,7 +1023,8 @@ struct x86_pmu {
 	/*
 	 * Intel host/guest support (KVM)
 	 */
-	struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr, void *data);
+	struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr,
+							struct x86_guest_pebs *guest_pebs);
 
 	/*
 	 * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c470e40a00aa..91b070168947 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -600,15 +600,6 @@ struct kvm_pmu {
 	u64 pebs_data_cfg;
 	u64 pebs_data_cfg_rsvd;
 
-	/*
-	 * If a guest counter is cross-mapped to host counter with different
-	 * index, its PEBS capability will be temporarily disabled.
-	 *
-	 * The user should make sure that this mask is updated
-	 * after disabling interrupts and before perf_guest_get_msrs();
-	 */
-	u64 host_cross_mapped_mask;
-
 	/*
 	 * The gate to release perf_events not marked in
 	 * pmc_in_use only once in a vcpu time slice.
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index ff5acb8b199b..5340d8bb1d92 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -769,11 +769,19 @@ extern void perf_load_guest_lvtpc(u32 guest_lvtpc);
 extern void perf_put_guest_lvtpc(void);
 #endif
 
+struct x86_guest_pebs {
+	u64	enable;
+	u64	ds_area;
+	u64	data_cfg;
+	u64	cross_mapped_mask;
+};
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
-extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data);
+extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr,
+							 struct x86_guest_pebs *guest_pebs);
 extern void x86_perf_get_lbr(struct x86_pmu_lbr *lbr);
 #else
-struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data);
+struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr,
+						  struct x86_guest_pebs *guest_pebs);
 static inline void x86_perf_get_lbr(struct x86_pmu_lbr *lbr)
 {
 	memset(lbr, 0, sizeof(*lbr));
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 27eb76e6b6a0..0197007593f3 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -736,11 +736,24 @@ static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
 		intel_pmu_release_guest_lbr_event(vcpu);
 }
 
-void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu)
+u64 intel_pmu_get_cross_mapped_mask(struct kvm_pmu *pmu)
 {
-	struct kvm_pmc *pmc = NULL;
+	u64 host_cross_mapped_mask;
+	struct kvm_pmc *pmc;
 	int bit, hw_idx;
 
+	if (!(pmu->pebs_enable & pmu->global_ctrl))
+		return 0;
+
+	/*
+	 * Provide a mask of counters that are cross-mapped between the guest
+	 * and the host, i.e. where a guest PMC is mapped to a host PMC with a
+	 * different index.  PEBS records hold a PERF_GLOBAL_STATUS snapshot,
+	 * and so PEBS-enabled counters need to hold the correct index so as
+	 * not to confuse the guest.
+	 */
+	host_cross_mapped_mask = 0;
+
 	kvm_for_each_pmc(pmu, pmc, bit, (unsigned long *)&pmu->global_ctrl) {
 		if (!pmc_is_locally_enabled(pmc) ||
 		    !pmc_is_globally_enabled(pmc) || !pmc->perf_event)
@@ -752,8 +765,9 @@ void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu)
 		 */
 		hw_idx = pmc->perf_event->hw.idx;
 		if (hw_idx != pmc->idx && hw_idx > -1)
-			pmu->host_cross_mapped_mask |= BIT_ULL(hw_idx);
+			host_cross_mapped_mask |= BIT_ULL(hw_idx);
 	}
+	return host_cross_mapped_mask;
 }
 
 static bool intel_pmu_is_mediated_pmu_supported(struct x86_pmu_capability *host_pmu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a29896a9ef14..e6c1c64a8c94 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7313,12 +7313,15 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
 	if (kvm_vcpu_has_mediated_pmu(&vmx->vcpu))
 		return;
 
-	pmu->host_cross_mapped_mask = 0;
-	if (pmu->pebs_enable & pmu->global_ctrl)
-		intel_pmu_cross_mapped_check(pmu);
+	struct x86_guest_pebs guest_pebs = {
+		.enable = pmu->pebs_enable,
+		.ds_area = pmu->ds_area,
+		.data_cfg = pmu->pebs_data_cfg,
+		.cross_mapped_mask = intel_pmu_get_cross_mapped_mask(pmu),
+	};
 
 	/* Note, nr_msrs may be garbage if perf_guest_get_msrs() returns NULL. */
-	msrs = perf_guest_get_msrs(&nr_msrs, (void *)pmu);
+	msrs = perf_guest_get_msrs(&nr_msrs, &guest_pebs);
 	if (!msrs)
 		return;
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index db84e8001da5..0c4563472940 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -659,7 +659,7 @@ static __always_inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
 	return container_of(vcpu, struct vcpu_vmx, vcpu);
 }
 
-void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu);
+u64 intel_pmu_get_cross_mapped_mask(struct kvm_pmu *pmu);
 int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
 void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu);
 
-- 
2.54.0.rc0.605.g598a273b03-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] perf/x86/intel: Don't context switch DS_AREA (and PEBS config) if PEBS is unused
  2026-04-14 19:14 ` [PATCH 2/4] perf/x86/intel: Don't context switch DS_AREA (and PEBS config) if PEBS is unused Sean Christopherson
@ 2026-04-14 21:31   ` Jim Mattson
  2026-04-14 22:49     ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: Jim Mattson @ 2026-04-14 21:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	Paolo Bonzini, linux-perf-users, linux-kernel, kvm, Mingwei Zhang,
	Stephane Eranian, Dapeng Mi

On Tue, Apr 14, 2026 at 12:14 PM Sean Christopherson <seanjc@google.com> wrote:
>
> When filling the list of MSRs to be loaded by KVM on VM-Enter and VM-Exit,
> insert DS_AREA and (conditionally) MSR_PEBS_DATA_CFG into the list if and
> only if PEBS will be active in the guest, i.e. only if a PEBS record may be
> generated while running the guest.  As shown by the !x86_pmu.pebs_ept path,
> it's perfectly safe to run with the host's DS_AREA, so long as PEBS-enabled
> counters are disabled via PERF_GLOBAL_CTRL.
>
> Omitting DS_AREA and MSR_PEBS_DATA_CFG when PEBS is unused saves two MSR
> writes per MSR on each VMX transition, i.e. eliminates two/four pointless
> MSR writes on each VMX roundtrip when PEBS isn't being used by the guest.
>
> Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Mingwei Zhang <mizhang@google.com>
> Cc: Stephane Eranian <eranian@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/events/intel/core.c | 41 ++++++++++++++++++++++++------------
>  1 file changed, 27 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 002d809f82ef..20a153aa33cb 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -5037,23 +5037,14 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>                 return arr;
>         }
>
> +       /*
> +        * If the guest won't use PEBS or the CPU doesn't support PEBS in the
> +        * guest, then there's nothing more to do as disabling PMCs via
> +        * PERF_GLOBAL_CTRL is sufficient on CPUs with guest/host isolation.
> +        */
>         if (!kvm_pmu || !x86_pmu.pebs_ept)
>                 return arr;
>
> -       arr[(*nr)++] = (struct perf_guest_switch_msr){
> -               .msr = MSR_IA32_DS_AREA,
> -               .host = (unsigned long)cpuc->ds,
> -               .guest = kvm_pmu->ds_area,
> -       };
> -
> -       if (x86_pmu.intel_cap.pebs_baseline) {
> -               arr[(*nr)++] = (struct perf_guest_switch_msr){
> -                       .msr = MSR_PEBS_DATA_CFG,
> -                       .host = cpuc->active_pebs_data_cfg,
> -                       .guest = kvm_pmu->pebs_data_cfg,
> -               };
> -       }
> -
>         /*
>          * Disable counters where the guest PMC is different than the host PMC
>          * being used on behalf of the guest, as the PEBS record includes
> @@ -5065,6 +5056,28 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>         if (pebs_mask & ~cpuc->intel_ctrl_guest_mask)
>                 guest_pebs_mask = 0;
>
> +       /*
> +        * Context switch DS_AREA and PEBS_DATA_CFG if and only if PEBS will be
> +        * active in the guest; if no records will be generated while the guest
> +        * is running, then running with host values is safe (see above).
> +        */
> +       if (!guest_pebs_mask)
> +               return arr;

I think this has an unintended side effect. If DS_AREA and
PEBS_DATA_CFG were previously listed (because pebs_guest_mask was
previously non-zero), KVM will leave the stale entries in the MSR-load
lists.

KVM only clears MSR-load list entries for enumerated MSRs with
matching guest and host values.

> +       arr[(*nr)++] = (struct perf_guest_switch_msr){
> +               .msr = MSR_IA32_DS_AREA,
> +               .host = (unsigned long)cpuc->ds,
> +               .guest = kvm_pmu->ds_area,
> +       };
> +
> +       if (x86_pmu.intel_cap.pebs_baseline) {
> +               arr[(*nr)++] = (struct perf_guest_switch_msr){
> +                       .msr = MSR_PEBS_DATA_CFG,
> +                       .host = cpuc->active_pebs_data_cfg,
> +                       .guest = kvm_pmu->pebs_data_cfg,
> +               };
> +       }
> +
>         /*
>          * Do NOT mess with PEBS_ENABLED.  As above, disabling counters via
>          * PERF_GLOBAL_CTRL is sufficient, and loading a stale PEBS_ENABLED,
> --
> 2.54.0.rc0.605.g598a273b03-goog
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] perf/x86/intel: Make @data a mandatory param for intel_guest_get_msrs()
  2026-04-14 19:14 ` [PATCH 3/4] perf/x86/intel: Make @data a mandatory param for intel_guest_get_msrs() Sean Christopherson
@ 2026-04-14 22:29   ` Jim Mattson
  0 siblings, 0 replies; 13+ messages in thread
From: Jim Mattson @ 2026-04-14 22:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	Paolo Bonzini, linux-perf-users, linux-kernel, kvm, Mingwei Zhang,
	Stephane Eranian, Dapeng Mi

On Tue, Apr 14, 2026 at 12:14 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Drop "support" for passing a NULL @data/@kvm_pmu param when getting guest
> MSRs.  KVM, the only in-tree user, unconditionally passes a non-NULL
> pointer, and carrying code that suggests @data may be NULL is confusing,
> e.g. incorrectly implies that there are scenarios where KVM doesn't pass
> a PMU context.
>
> Fixes: 8183a538cd95 ("KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS")
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Mingwei Zhang <mizhang@google.com>
> Cc: Stephane Eranian <eranian@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] perf/x86/intel: Don't context switch DS_AREA (and PEBS config) if PEBS is unused
  2026-04-14 21:31   ` Jim Mattson
@ 2026-04-14 22:49     ` Sean Christopherson
  2026-04-15 13:00       ` Jim Mattson
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2026-04-14 22:49 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	Paolo Bonzini, linux-perf-users, linux-kernel, kvm, Mingwei Zhang,
	Stephane Eranian, Dapeng Mi

On Tue, Apr 14, 2026, Jim Mattson wrote:
> On Tue, Apr 14, 2026 at 12:14 PM Sean Christopherson <seanjc@google.com> wrote:
> > @@ -5065,6 +5056,28 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
> >         if (pebs_mask & ~cpuc->intel_ctrl_guest_mask)
> >                 guest_pebs_mask = 0;
> >
> > +       /*
> > +        * Context switch DS_AREA and PEBS_DATA_CFG if and only if PEBS will be
> > +        * active in the guest; if no records will be generated while the guest
> > +        * is running, then running with host values is safe (see above).
> > +        */
> > +       if (!guest_pebs_mask)
> > +               return arr;
> 
> I think this has an unintended side effect. If DS_AREA and
> PEBS_DATA_CFG were previously listed (because pebs_guest_mask was
> previously non-zero), KVM will leave the stale entries in the MSR-load
> lists.
> 
> KVM only clears MSR-load list entries for enumerated MSRs with
> matching guest and host values.

Argh, right.  This as a delta change I guess?  I'd love to overhaul the interface,
but reworking KVM to play nice with core_guest_get_msrs() would be much more
difficult.

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 20a153aa33cb..3244d5589981 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5059,22 +5059,20 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
        /*
         * Context switch DS_AREA and PEBS_DATA_CFG if and only if PEBS will be
         * active in the guest; if no records will be generated while the guest
-        * is running, then running with host values is safe (see above).
+        * is running, then simply keep the host values resident in hardware.
         */
-       if (!guest_pebs_mask)
-               return arr;
-
        arr[(*nr)++] = (struct perf_guest_switch_msr){
                .msr = MSR_IA32_DS_AREA,
                .host = (unsigned long)cpuc->ds,
-               .guest = kvm_pmu->ds_area,
+               .guest = guest_pebs_mask ? kvm_pmu->ds_area : (unsigned long)cpuc->ds,
        };
 
        if (x86_pmu.intel_cap.pebs_baseline) {
                arr[(*nr)++] = (struct perf_guest_switch_msr){
                        .msr = MSR_PEBS_DATA_CFG,
                        .host = cpuc->active_pebs_data_cfg,
-                       .guest = kvm_pmu->pebs_data_cfg,
+                       .guest = guest_pebs_mask ? kvm_pmu->data_cfg :
+                                                  cpuc->active_pebs_data_cfg,
                };
        }
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] perf/x86/intel: Don't context switch DS_AREA (and PEBS config) if PEBS is unused
  2026-04-14 22:49     ` Sean Christopherson
@ 2026-04-15 13:00       ` Jim Mattson
  0 siblings, 0 replies; 13+ messages in thread
From: Jim Mattson @ 2026-04-15 13:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	Paolo Bonzini, linux-perf-users, linux-kernel, kvm, Mingwei Zhang,
	Stephane Eranian, Dapeng Mi

On Tue, Apr 14, 2026 at 3:49 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Apr 14, 2026, Jim Mattson wrote:
> > On Tue, Apr 14, 2026 at 12:14 PM Sean Christopherson <seanjc@google.com> wrote:
> > > @@ -5065,6 +5056,28 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
> > >         if (pebs_mask & ~cpuc->intel_ctrl_guest_mask)
> > >                 guest_pebs_mask = 0;
> > >
> > > +       /*
> > > +        * Context switch DS_AREA and PEBS_DATA_CFG if and only if PEBS will be
> > > +        * active in the guest; if no records will be generated while the guest
> > > +        * is running, then running with host values is safe (see above).
> > > +        */
> > > +       if (!guest_pebs_mask)
> > > +               return arr;
> >
> > I think this has an unintended side effect. If DS_AREA and
> > PEBS_DATA_CFG were previously listed (because pebs_guest_mask was
> > previously non-zero), KVM will leave the stale entries in the MSR-load
> > lists.
> >
> > KVM only clears MSR-load list entries for enumerated MSRs with
> > matching guest and host values.
>
> Argh, right.  This as a delta change I guess?  I'd love to overhaul the interface,
> but reworking KVM to play nice with core_guest_get_msrs() would be much more
> difficult.
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 20a153aa33cb..3244d5589981 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -5059,22 +5059,20 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>         /*
>          * Context switch DS_AREA and PEBS_DATA_CFG if and only if PEBS will be
>          * active in the guest; if no records will be generated while the guest
> -        * is running, then running with host values is safe (see above).
> +        * is running, then simply keep the host values resident in hardware.
>          */
> -       if (!guest_pebs_mask)
> -               return arr;
> -
>         arr[(*nr)++] = (struct perf_guest_switch_msr){
>                 .msr = MSR_IA32_DS_AREA,
>                 .host = (unsigned long)cpuc->ds,
> -               .guest = kvm_pmu->ds_area,
> +               .guest = guest_pebs_mask ? kvm_pmu->ds_area : (unsigned long)cpuc->ds,
>         };
>
>         if (x86_pmu.intel_cap.pebs_baseline) {
>                 arr[(*nr)++] = (struct perf_guest_switch_msr){
>                         .msr = MSR_PEBS_DATA_CFG,
>                         .host = cpuc->active_pebs_data_cfg,
> -                       .guest = kvm_pmu->pebs_data_cfg,
> +                       .guest = guest_pebs_mask ? kvm_pmu->data_cfg :
> +                                                  cpuc->active_pebs_data_cfg,
>                 };
>         }
>
That looks reasonable, if a bit obtuse.

Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU has isolation
  2026-04-14 19:14 ` [PATCH 1/4] perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU has isolation Sean Christopherson
@ 2026-04-16 18:24   ` Namhyung Kim
  2026-04-16 19:38     ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: Namhyung Kim @ 2026-04-16 18:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, Paolo Bonzini,
	linux-perf-users, linux-kernel, kvm, Jim Mattson, Mingwei Zhang,
	Stephane Eranian, Dapeng Mi

Hello,

On Tue, Apr 14, 2026 at 12:14:22PM -0700, Sean Christopherson wrote:
> When filling the list of MSRs to be loaded by KVM on VM-Enter and VM-Exit,
> *never* insert an entry for PEBS_ENABLED if the CPU properly isolates PEBS
> events, in which case disabling counters via PERF_GLOBAL_CTRL is sufficient
> to prevent unwanted PEBS events in the guest (or host).  Because perf loads
> PEBS_ENABLE with the unfiltered cpu_hw_events.pebs_enabled, i.e. with both
> host and guest masks, there is no need to load different values for the
> guest versus host, perf+KVM can and should simply control which counters
> are enabled/disabled via PERF_GLOBAL_CTRL.
> 
> Avoiding touching PEBS_ENABLED fixes a bug where PEBS_ENABLED can end up
> with "stuck" bits if a PEBS event is throttled better generating the list
> and actually entering the guest (Intel CPUs can't arbtitrarily block NMIs).
> And stating the obvious, leaving PEBS_ENABLED as-is avoids two MSR writes
> on every VMX transition.
> 
> Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Mingwei Zhang <mizhang@google.com>
> Cc: Stephane Eranian <eranian@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/events/intel/core.c | 42 ++++++++++++++++++++----------------
>  1 file changed, 23 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 793335c3ce78..002d809f82ef 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4999,12 +4999,15 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>  	struct kvm_pmu *kvm_pmu = (struct kvm_pmu *)data;
>  	u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
>  	u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable;
> -	int global_ctrl, pebs_enable;
> +	u64 guest_pebs_mask = pebs_mask & ~cpuc->intel_ctrl_host_mask;
> +	int global_ctrl;
>  
>  	/*
>  	 * In addition to obeying exclude_guest/exclude_host, remove bits being
>  	 * used for PEBS when running a guest, because PEBS writes to virtual
> -	 * addresses (not physical addresses).
> +	 * addresses (not physical addresses).  If the guest wants to utilize
> +	 * PEBS, and PEBS can safely enabled in the guest, bits for the guest's
> +	 * PEBS-enabled counters will be OR'd back in as appropriate.
>  	 */
>  	*nr = 0;
>  	global_ctrl = (*nr)++;
> @@ -5051,24 +5054,25 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>  		};
>  	}
>  
> -	pebs_enable = (*nr)++;
> -	arr[pebs_enable] = (struct perf_guest_switch_msr){
> -		.msr = MSR_IA32_PEBS_ENABLE,
> -		.host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,
> -		.guest = pebs_mask & ~cpuc->intel_ctrl_host_mask & kvm_pmu->pebs_enable,
> -	};
> -
> -	if (arr[pebs_enable].host) {
> -		/* Disable guest PEBS if host PEBS is enabled. */
> -		arr[pebs_enable].guest = 0;
> -	} else {
> -		/* Disable guest PEBS thoroughly for cross-mapped PEBS counters. */
> -		arr[pebs_enable].guest &= ~kvm_pmu->host_cross_mapped_mask;
> -		arr[global_ctrl].guest &= ~kvm_pmu->host_cross_mapped_mask;
> -		/* Set hw GLOBAL_CTRL bits for PEBS counter when it runs for guest */
> -		arr[global_ctrl].guest |= arr[pebs_enable].guest;
> -	}
> +	/*
> +	 * Disable counters where the guest PMC is different than the host PMC
> +	 * being used on behalf of the guest, as the PEBS record includes
> +	 * PERF_GLOBAL_STATUS, i.e. the guest will see overflow status for the
> +	 * wrong counter(s).  Similarly, disallow PEBS in the guest if the host
> +	 * is using PEBS, to avoid bleeding host state into PEBS records.
> +	 */
> +	guest_pebs_mask &= kvm_pmu->pebs_enable & ~kvm_pmu->host_cross_mapped_mask;
> +	if (pebs_mask & ~cpuc->intel_ctrl_guest_mask)
> +		guest_pebs_mask = 0;
>  
> +	/*
> +	 * Do NOT mess with PEBS_ENABLED.  As above, disabling counters via
> +	 * PERF_GLOBAL_CTRL is sufficient, and loading a stale PEBS_ENABLED,
> +	 * e.g. on VM-Exit, can put the system in a bad state.  Simply enable
> +	 * counters in PERF_GLOBAL_CTRL, as perf load PEBS_ENABLED with the
> +	 * full value, i.e. perf *also* relies on PERF_GLOBAL_CTRL.
> +	 */
> +	arr[global_ctrl].guest |= guest_pebs_mask;

I was confused by the earlier comment in the funcion that says it is not
enough to disable counters but I've realized it's only for the case PEBS
isolation is not supported by CPU/ucode.

I think it's ok for disabling guest PEBS, but I'm curious if there's a
case to enable PEBS only in guest and it'd be handled correctly.

Thanks,
Namhyung

>  	return arr;
>  }
>  
> -- 
> 2.54.0.rc0.605.g598a273b03-goog
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU has isolation
  2026-04-16 18:24   ` Namhyung Kim
@ 2026-04-16 19:38     ` Sean Christopherson
  2026-04-16 23:51       ` Namhyung Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2026-04-16 19:38 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, Paolo Bonzini,
	linux-perf-users, linux-kernel, kvm, Jim Mattson, Mingwei Zhang,
	Stephane Eranian, Dapeng Mi

On Thu, Apr 16, 2026, Namhyung Kim wrote:
> > +	/*
> > +	 * Disable counters where the guest PMC is different than the host PMC
> > +	 * being used on behalf of the guest, as the PEBS record includes
> > +	 * PERF_GLOBAL_STATUS, i.e. the guest will see overflow status for the
> > +	 * wrong counter(s).  Similarly, disallow PEBS in the guest if the host
> > +	 * is using PEBS, to avoid bleeding host state into PEBS records.
> > +	 */
> > +	guest_pebs_mask &= kvm_pmu->pebs_enable & ~kvm_pmu->host_cross_mapped_mask;
> > +	if (pebs_mask & ~cpuc->intel_ctrl_guest_mask)
> > +		guest_pebs_mask = 0;
> >  
> > +	/*
> > +	 * Do NOT mess with PEBS_ENABLED.  As above, disabling counters via
> > +	 * PERF_GLOBAL_CTRL is sufficient, and loading a stale PEBS_ENABLED,
> > +	 * e.g. on VM-Exit, can put the system in a bad state.  Simply enable
> > +	 * counters in PERF_GLOBAL_CTRL, as perf load PEBS_ENABLED with the
> > +	 * full value, i.e. perf *also* relies on PERF_GLOBAL_CTRL.
> > +	 */
> > +	arr[global_ctrl].guest |= guest_pebs_mask;
> 
> I was confused by the earlier comment in the funcion that says it is not
> enough to disable counters but I've realized it's only for the case PEBS
> isolation is not supported by CPU/ucode.

Yeah, me too, more than once. :-/

> I think it's ok for disabling guest PEBS, but I'm curious if there's a
> case to enable PEBS only in guest and it'd be handled correctly.

Yep, if PEBS is being virtualized for the guest, unless the host is also profiling,
then PEBS will be active for the guest but not the host.  KVM tests for PEBS pass,
and while they aren't exactly comprehensive, they should detect outright breakage.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU has isolation
  2026-04-16 19:38     ` Sean Christopherson
@ 2026-04-16 23:51       ` Namhyung Kim
  2026-04-17  0:23         ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: Namhyung Kim @ 2026-04-16 23:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, Paolo Bonzini,
	linux-perf-users, linux-kernel, kvm, Jim Mattson, Mingwei Zhang,
	Stephane Eranian, Dapeng Mi

On Thu, Apr 16, 2026 at 12:38:49PM -0700, Sean Christopherson wrote:
> On Thu, Apr 16, 2026, Namhyung Kim wrote:
> > > +	/*
> > > +	 * Disable counters where the guest PMC is different than the host PMC
> > > +	 * being used on behalf of the guest, as the PEBS record includes
> > > +	 * PERF_GLOBAL_STATUS, i.e. the guest will see overflow status for the
> > > +	 * wrong counter(s).  Similarly, disallow PEBS in the guest if the host
> > > +	 * is using PEBS, to avoid bleeding host state into PEBS records.
> > > +	 */
> > > +	guest_pebs_mask &= kvm_pmu->pebs_enable & ~kvm_pmu->host_cross_mapped_mask;
> > > +	if (pebs_mask & ~cpuc->intel_ctrl_guest_mask)
> > > +		guest_pebs_mask = 0;
> > >  
> > > +	/*
> > > +	 * Do NOT mess with PEBS_ENABLED.  As above, disabling counters via
> > > +	 * PERF_GLOBAL_CTRL is sufficient, and loading a stale PEBS_ENABLED,
> > > +	 * e.g. on VM-Exit, can put the system in a bad state.  Simply enable
> > > +	 * counters in PERF_GLOBAL_CTRL, as perf load PEBS_ENABLED with the
> > > +	 * full value, i.e. perf *also* relies on PERF_GLOBAL_CTRL.
> > > +	 */
> > > +	arr[global_ctrl].guest |= guest_pebs_mask;
> > 
> > I was confused by the earlier comment in the funcion that says it is not
> > enough to disable counters but I've realized it's only for the case PEBS
> > isolation is not supported by CPU/ucode.
> 
> Yeah, me too, more than once. :-/
> 
> > I think it's ok for disabling guest PEBS, but I'm curious if there's a
> > case to enable PEBS only in guest and it'd be handled correctly.
> 
> Yep, if PEBS is being virtualized for the guest, unless the host is also profiling,
> then PEBS will be active for the guest but not the host.  KVM tests for PEBS pass,
> and while they aren't exactly comprehensive, they should detect outright breakage.

In that case, wouldn't it need to update PEBS_ENABLED here?

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU has isolation
  2026-04-16 23:51       ` Namhyung Kim
@ 2026-04-17  0:23         ` Sean Christopherson
  0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2026-04-17  0:23 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, Paolo Bonzini,
	linux-perf-users, linux-kernel, kvm, Jim Mattson, Mingwei Zhang,
	Stephane Eranian, Dapeng Mi

On Thu, Apr 16, 2026, Namhyung Kim wrote:
> On Thu, Apr 16, 2026 at 12:38:49PM -0700, Sean Christopherson wrote:
> > On Thu, Apr 16, 2026, Namhyung Kim wrote:
> > > > +	/*
> > > > +	 * Disable counters where the guest PMC is different than the host PMC
> > > > +	 * being used on behalf of the guest, as the PEBS record includes
> > > > +	 * PERF_GLOBAL_STATUS, i.e. the guest will see overflow status for the
> > > > +	 * wrong counter(s).  Similarly, disallow PEBS in the guest if the host
> > > > +	 * is using PEBS, to avoid bleeding host state into PEBS records.
> > > > +	 */
> > > > +	guest_pebs_mask &= kvm_pmu->pebs_enable & ~kvm_pmu->host_cross_mapped_mask;
> > > > +	if (pebs_mask & ~cpuc->intel_ctrl_guest_mask)
> > > > +		guest_pebs_mask = 0;
> > > >  
> > > > +	/*
> > > > +	 * Do NOT mess with PEBS_ENABLED.  As above, disabling counters via
> > > > +	 * PERF_GLOBAL_CTRL is sufficient, and loading a stale PEBS_ENABLED,
> > > > +	 * e.g. on VM-Exit, can put the system in a bad state.  Simply enable
> > > > +	 * counters in PERF_GLOBAL_CTRL, as perf load PEBS_ENABLED with the
> > > > +	 * full value, i.e. perf *also* relies on PERF_GLOBAL_CTRL.
> > > > +	 */
> > > > +	arr[global_ctrl].guest |= guest_pebs_mask;
> > > 
> > > I was confused by the earlier comment in the funcion that says it is not
> > > enough to disable counters but I've realized it's only for the case PEBS
> > > isolation is not supported by CPU/ucode.
> > 
> > Yeah, me too, more than once. :-/
> > 
> > > I think it's ok for disabling guest PEBS, but I'm curious if there's a
> > > case to enable PEBS only in guest and it'd be handled correctly.
> > 
> > Yep, if PEBS is being virtualized for the guest, unless the host is also profiling,
> > then PEBS will be active for the guest but not the host.  KVM tests for PEBS pass,
> > and while they aren't exactly comprehensive, they should detect outright breakage.
> 
> In that case, wouldn't it need to update PEBS_ENABLED here?

No, because KVM only supports guest usage of PEBS when KVM is proxying the guest
PMU via perf.  I.e. when perf fully controls PEBS_ENABLED.  On perf's side, events
created by KVM show up as "guest only", but are otherwise programmed normally,
including getting shoved into PEBS_ENABLED as needed.  So to activate PEBS PMCs
for the guest, KVM just needs to load PERF_GLOBAL_CTRL with the counters that
should be enabled for the guest (taking care to not load "guest only" PEBS events
that were (stupidly) created by host userspace, as those can crash the guest.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-04-17  0:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-14 19:14 [PATCH 0/4] perf/x86: Don't write PEBS_ENABLED on KVM transitions Sean Christopherson
2026-04-14 19:14 ` [PATCH 1/4] perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU has isolation Sean Christopherson
2026-04-16 18:24   ` Namhyung Kim
2026-04-16 19:38     ` Sean Christopherson
2026-04-16 23:51       ` Namhyung Kim
2026-04-17  0:23         ` Sean Christopherson
2026-04-14 19:14 ` [PATCH 2/4] perf/x86/intel: Don't context switch DS_AREA (and PEBS config) if PEBS is unused Sean Christopherson
2026-04-14 21:31   ` Jim Mattson
2026-04-14 22:49     ` Sean Christopherson
2026-04-15 13:00       ` Jim Mattson
2026-04-14 19:14 ` [PATCH 3/4] perf/x86/intel: Make @data a mandatory param for intel_guest_get_msrs() Sean Christopherson
2026-04-14 22:29   ` Jim Mattson
2026-04-14 19:14 ` [PATCH 4/4] perf/x86: KVM: Have perf define a dedicated struct for getting guest PEBS data Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox