Linux Perf Users
 help / color / mirror / Atom feed
* [PATCH 0/8] perf/x86: Miscellaneous PMU bug fixes
@ 2026-06-05  1:11 Dapeng Mi
  2026-06-05  1:11 ` [PATCH 1/8] perf/x86/intel: Remove anythread_deprecated bit from perf_capabilities Dapeng Mi
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Dapeng Mi @ 2026-06-05  1:11 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen,
	Falcon Thomas, Xudong Hao, Dapeng Mi

This series groups several independent PMU fixes to simplify review and
backporting.

Patch layout:
- Patch 1/8: Fix anythread_deprecated being overwritten issue. 
- Patches 2-3/8: Fix the issue that cap_user_rdpmc is not updated
  correctly.
- Patches 4-5/8: Fix a kernel address leakage issue in LBR stack.
- Patch 6/8: Fix the issue that the return value of
  intel_pmu_init_hybrid() is not valiated correctly.
- Patch 7/8: Fix a "unchecked MSR access error" on PEBS_ENABLE MSR.
- Patch 8/8: Prevent a theoretical kernel register data leak in sampling.


Dapeng Mi (7):
  perf/x86/intel: Remove anythread_deprecated bit from perf_capabilities
  perf/x86: Update cap_user_rdpmc base on rdpmc user disable state
  perf/x86/intel: Fix redundant branch type check in
    intel_pmu_lbr_filter()
  perf/x86/intel: Fix kernel address leakages in LBR stack
  perf/x86/intel: Validate return value of intel_pmu_init_hybrid()
  perf/x86/intel: Drop fixed-counter PEBS constraints for baseline PEBS
  perf/core: Fix kernel register info leak via hardware skid

Ian Rogers (1):
  perf/x86: Introduce is_x86_pmu() helper

 arch/x86/events/core.c       | 19 +++-------------
 arch/x86/events/intel/core.c | 43 ++++++++++++++++++++++++------------
 arch/x86/events/intel/ds.c   | 13 -----------
 arch/x86/events/intel/lbr.c  | 13 ++++++++---
 arch/x86/events/perf_event.h | 25 +++++++++++++++++----
 kernel/events/core.c         | 20 +++++++++++++----
 6 files changed, 79 insertions(+), 54 deletions(-)


base-commit: 66cc29745f2f5815482587bb9fbc1e8a3e6fcf00
-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/8] perf/x86/intel: Remove anythread_deprecated bit from perf_capabilities
  2026-06-05  1:11 [PATCH 0/8] perf/x86: Miscellaneous PMU bug fixes Dapeng Mi
@ 2026-06-05  1:11 ` Dapeng Mi
  2026-06-05  1:11 ` [PATCH 2/8] perf/x86: Introduce is_x86_pmu() helper Dapeng Mi
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Dapeng Mi @ 2026-06-05  1:11 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen,
	Falcon Thomas, Xudong Hao, Dapeng Mi, stable

AnyThread mode deprecation is enumerated by CPUID.0AH:EDX[15] instead of
PERF_CAPABILITIES MSR. It's not a good practice to define a bit to
represent "anythread deprecation" in perf_capabilities. It leads to the
anythread_deprecated bit could be overwritten by the real value of
PERF_CAPABILITIES MSR, just like the below code in update_pmu_cap() does.

```
if (!intel_pmu_broken_perf_cap()) {
	/* Perf Metric (Bit 15) and PEBS via PT (Bit 16) are hybrid enumeration */
	rdmsrq(MSR_IA32_PERF_CAPABILITIES, hybrid(pmu, intel_cap).capabilities);
}
```

It leads to the anythread_deprecated bit is cleared to 0 and the "any"
attribute is incorrectly shown in the /sys/devices/cpu/format/ folder on
these support Perfmon v6 platforms, like Clearwater Forest.

```
$grep . /sys/devices/cpu/format/*
/sys/devices/cpu/format/acr_mask:config2:0-63
/sys/devices/cpu/format/any:config:21
/sys/devices/cpu/format/cmask:config:24-31
```

So remove the anythread_deprecated bit from perf_capabilities structure
and directly depends on CPUID.0AH:EDX[15] to judge if anythread is
deprecated.

Cc: stable@vger.kernel.org
Reported-by: Namhyung Kim <namhyung@kernel.org>
Fixes: cadbaa039b99 ("perf/x86/intel: Make anythread filter support conditional")
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
---

Original patch link:
https://lore.kernel.org/all/20260423053306.3033331-1-dapeng1.mi@linux.intel.com/

 arch/x86/events/intel/core.c | 10 +++-------
 arch/x86/events/perf_event.h |  2 +-
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 0217e701aeeb..ea3ab3050a3b 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -7946,12 +7946,6 @@ __init int intel_pmu_init(void)
 
 	x86_add_quirk(intel_arch_events_quirk); /* Install first, so it runs last */
 
-	if (version >= 5) {
-		x86_pmu.intel_cap.anythread_deprecated = edx.split.anythread_deprecated;
-		if (x86_pmu.intel_cap.anythread_deprecated)
-			pr_cont(" AnyThread deprecated, ");
-	}
-
 	/* The perf side of core PMU is ready to support the mediated vPMU. */
 	x86_get_pmu(smp_processor_id())->capabilities |= PERF_PMU_CAP_MEDIATED_VPMU;
 
@@ -8828,8 +8822,10 @@ __init int intel_pmu_init(void)
 				      &x86_pmu.intel_ctrl);
 
 	/* AnyThread may be deprecated on arch perfmon v5 or later */
-	if (x86_pmu.intel_cap.anythread_deprecated)
+	if (version >= 5 && edx.split.anythread_deprecated) {
 		x86_pmu.format_attrs = intel_arch_formats_attr;
+		pr_cont("AnyThread deprecated, ");
+	}
 
 	intel_pmu_check_event_constraints_all(NULL);
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index eae24bb35dc1..5902a297daa1 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -668,7 +668,7 @@ union perf_capabilities {
 		u64	perf_metrics:1;
 		u64	pebs_output_pt_available:1;
 		u64	pebs_timing_info:1;
-		u64	anythread_deprecated:1;
+		u64	__reserved:1;
 		u64	rdpmc_metrics_clear:1;
 	};
 	u64	capabilities;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/8] perf/x86: Introduce is_x86_pmu() helper
  2026-06-05  1:11 [PATCH 0/8] perf/x86: Miscellaneous PMU bug fixes Dapeng Mi
  2026-06-05  1:11 ` [PATCH 1/8] perf/x86/intel: Remove anythread_deprecated bit from perf_capabilities Dapeng Mi
@ 2026-06-05  1:11 ` Dapeng Mi
  2026-06-05  1:11 ` [PATCH 3/8] perf/x86: Update cap_user_rdpmc base on rdpmc user disable state Dapeng Mi
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Dapeng Mi @ 2026-06-05  1:11 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen,
	Falcon Thomas, Xudong Hao, Dapeng Mi

From: Ian Rogers <irogers@google.com>

To facilitate the detection of x86 PMU structures in upcoming patches,
the is_x86_pmu() helper is introduced. Additionally, the is_x86_event()
helper has been refactored to utilize is_x86_pmu().

No function changes intended.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
---

Original patch link:
https://lore.kernel.org/all/20260316050838.3624051-1-dapeng1.mi@linux.intel.com/

 arch/x86/events/core.c       | 16 ----------------
 arch/x86/events/perf_event.h | 18 +++++++++++++++++-
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 4b9e105309c6..3bd0522afe6d 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -774,22 +774,6 @@ void x86_pmu_enable_all(int added)
 	}
 }
 
-int is_x86_event(struct perf_event *event)
-{
-	/*
-	 * For a non-hybrid platforms, the type of X86 pmu is
-	 * always PERF_TYPE_RAW.
-	 * For a hybrid platform, the PERF_PMU_CAP_EXTENDED_HW_TYPE
-	 * is a unique capability for the X86 PMU.
-	 * Use them to detect a X86 event.
-	 */
-	if (event->pmu->type == PERF_TYPE_RAW ||
-	    event->pmu->capabilities & PERF_PMU_CAP_EXTENDED_HW_TYPE)
-		return true;
-
-	return false;
-}
-
 struct pmu *x86_get_pmu(unsigned int cpu)
 {
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 5902a297daa1..dbb5c8e8a8ea 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -115,7 +115,23 @@ static inline bool is_topdown_event(struct perf_event *event)
 	return is_metric_event(event) || is_slots_event(event);
 }
 
-int is_x86_event(struct perf_event *event);
+static inline bool is_x86_pmu(struct pmu *pmu)
+{
+	/*
+	 * For a non-hybrid platforms, the type of X86 pmu is
+	 * always PERF_TYPE_RAW.
+	 * For a hybrid platform, the PERF_PMU_CAP_EXTENDED_HW_TYPE
+	 * is a unique capability for the X86 PMU.
+	 * Use them to detect a X86 event.
+	 */
+	return pmu->type == PERF_TYPE_RAW ||
+	       pmu->capabilities & PERF_PMU_CAP_EXTENDED_HW_TYPE;
+}
+
+static inline bool is_x86_event(struct perf_event *event)
+{
+	return is_x86_pmu(event->pmu);
+}
 
 static inline bool check_leader_group(struct perf_event *leader, int flags)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/8] perf/x86: Update cap_user_rdpmc base on rdpmc user disable state
  2026-06-05  1:11 [PATCH 0/8] perf/x86: Miscellaneous PMU bug fixes Dapeng Mi
  2026-06-05  1:11 ` [PATCH 1/8] perf/x86/intel: Remove anythread_deprecated bit from perf_capabilities Dapeng Mi
  2026-06-05  1:11 ` [PATCH 2/8] perf/x86: Introduce is_x86_pmu() helper Dapeng Mi
@ 2026-06-05  1:11 ` Dapeng Mi
  2026-06-05  1:11 ` [PATCH 4/8] perf/x86/intel: Fix redundant branch type check in intel_pmu_lbr_filter() Dapeng Mi
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Dapeng Mi @ 2026-06-05  1:11 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen,
	Falcon Thomas, Xudong Hao, Dapeng Mi

After introducing the RDPMC user disable feature, user-space RDPMC may
return 0 instead of the actual event count. This creates an inconsistency
with cap_user_rdpmc, where cap_user_rdpmc is set, but user-space RDPMC
only returns 0.

To accurately represent the user-space RDPMC capability, update
cap_user_rdpmc based on the RDPMC user disable state. If RDPMC user
disable is enabled, cap_user_rdpmc is set to false, allowing user-space
programs to fall back to the read() syscall to obtain the real event
count.

Since arch_perf_update_userpage() could be called for software events,
enhance x86_pmu_has_rdpmc_user_disable() to only check the x86 PMUs.

Fixes: 59af95e028d4 ("perf/x86/intel: Add support for rdpmc user disable feature")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
---

Original patch link:
https://lore.kernel.org/all/20260316050838.3624051-2-dapeng1.mi@linux.intel.com/

 arch/x86/events/core.c       | 3 +++
 arch/x86/events/perf_event.h | 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 3bd0522afe6d..6cd95b8e31cb 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2797,6 +2797,9 @@ void arch_perf_update_userpage(struct perf_event *event,
 	userpg->cap_user_time_zero = 0;
 	userpg->cap_user_rdpmc =
 		!!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT);
+	if (x86_pmu_has_rdpmc_user_disable(event->pmu) &&
+	    event->hw.config & ARCH_PERFMON_EVENTSEL_RDPMC_USER_DISABLE)
+		userpg->cap_user_rdpmc = 0;
 	userpg->pmc_width = x86_pmu.cntval_bits;
 
 	if (!using_native_sched_clock() || !sched_clock_stable())
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index dbb5c8e8a8ea..4003e2e0aa9c 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1359,8 +1359,9 @@ static inline u64 x86_pmu_get_event_config(struct perf_event *event)
 
 static inline bool x86_pmu_has_rdpmc_user_disable(struct pmu *pmu)
 {
-	return !!(hybrid(pmu, config_mask) &
-		 ARCH_PERFMON_EVENTSEL_RDPMC_USER_DISABLE);
+	return is_x86_pmu(pmu) &&
+	       (hybrid(pmu, config_mask) &
+		ARCH_PERFMON_EVENTSEL_RDPMC_USER_DISABLE);
 }
 
 extern struct event_constraint emptyconstraint;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/8] perf/x86/intel: Fix redundant branch type check in intel_pmu_lbr_filter()
  2026-06-05  1:11 [PATCH 0/8] perf/x86: Miscellaneous PMU bug fixes Dapeng Mi
                   ` (2 preceding siblings ...)
  2026-06-05  1:11 ` [PATCH 3/8] perf/x86: Update cap_user_rdpmc base on rdpmc user disable state Dapeng Mi
@ 2026-06-05  1:11 ` Dapeng Mi
  2026-06-05  1:11 ` [PATCH 5/8] perf/x86/intel: Fix kernel address leakages in LBR stack Dapeng Mi
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Dapeng Mi @ 2026-06-05  1:11 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen,
	Falcon Thomas, Xudong Hao, Dapeng Mi, stable

In intel_pmu_lbr_filter(), the 'type' variable is bitwise ORed with
'to_plm' (which contains X86_BR_USER and/or X86_BR_KERNEL bits). Because
of this, 'type' can never equal X86_BR_NONE (0) after the assignment.

As a result, the subsequent check 'if (type == X86_BR_NONE)' is dead code
and the entries with X86_BR_NONE type would not be skipped eventually.

Correct this by masking out the X86_BR_KERNEL and X86_BR_USER bits
before performing the X86_BR_NONE comparison.

Cc: stable@vger.kernel.org
Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---

Original patch link:
https://lore.kernel.org/all/20260414021440.928068-1-dapeng1.mi@linux.intel.com/

 arch/x86/events/intel/lbr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 72f2adcda7c6..16977e4c6f8a 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1245,7 +1245,7 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
 		}
 
 		/* if type does not correspond, then discard */
-		if (type == X86_BR_NONE || (br_sel & type) != type) {
+		if ((type & ~X86_BR_PLM) == X86_BR_NONE || (br_sel & type) != type) {
 			cpuc->lbr_entries[i].from = 0;
 			compress = true;
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/8] perf/x86/intel: Fix kernel address leakages in LBR stack
  2026-06-05  1:11 [PATCH 0/8] perf/x86: Miscellaneous PMU bug fixes Dapeng Mi
                   ` (3 preceding siblings ...)
  2026-06-05  1:11 ` [PATCH 4/8] perf/x86/intel: Fix redundant branch type check in intel_pmu_lbr_filter() Dapeng Mi
@ 2026-06-05  1:11 ` Dapeng Mi
  2026-06-05  1:33   ` sashiko-bot
  2026-06-05  1:11 ` [PATCH 6/8] perf/x86/intel: Validate return value of intel_pmu_init_hybrid() Dapeng Mi
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Dapeng Mi @ 2026-06-05  1:11 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen,
	Falcon Thomas, Xudong Hao, Dapeng Mi, stable

Prior to the arch-LBR which supports CPL filtering, the kernel address
could be leaked to user space even PERF_SAMPLE_BRANCH_USER is required.

e.g., run below command on Intel Tigerlake platform,

```
$./perf record -e cycles:p -o - --branch-filter any,save_type,u -- \
 	./perf bench syscall basic --loop 1000 | \
	./perf script -i - --fields brstack|tr ' ' '\n'| \
	grep -E '0x[89a-f][0-9a-f]{15}'

    Total time: 0.000 [sec]

      0.219000 usecs/op
     4,566,210 ops/sec
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.551 MB - ]
0xffffffff93c001c8/0x7f12a2b1d647/P/-/-/16959/SYSRET/-
0xffffffff93c001c8/0x7f12a2b1d5c2/P/-/-/17535/SYSRET/-
0xffffffff93c01928/0x7f12a2861000/P/-/-/6719/ERET/-
0xffffffff93c01928/0x7f12a297a000/P/-/-/8575/ERET/-
```
The SYSRET/ERET branch calls are found the in the LBR stack, whose "from"
addresses are obviously kernel address.

Currently intel_pmu_lbr_filter() only filters out the LBR entries whose
"to" address is a kernel address but doesn't check the "from" address.

To fix the issue, extend the software filtering to both "from" and "to"
addresses.

Cc: stable@vger.kernel.org
Reported-by: Ian Rogers <irogers@google.com>
Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---

Original patch link:
https://lore.kernel.org/all/20260414021440.928068-2-dapeng1.mi@linux.intel.com/

 arch/x86/events/intel/lbr.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 16977e4c6f8a..deef81c16571 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1212,7 +1212,7 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
 {
 	u64 from, to;
 	int br_sel = cpuc->br_sel;
-	int i, j, type, to_plm;
+	int i, j, type, to_plm, from_plm;
 	bool compress = false;
 
 	/* if sampling all branches, then nothing to filter */
@@ -1244,8 +1244,15 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
 				type |= X86_BR_NO_TX;
 		}
 
-		/* if type does not correspond, then discard */
-		if ((type & ~X86_BR_PLM) == X86_BR_NONE || (br_sel & type) != type) {
+		from_plm = kernel_ip(from) ? X86_BR_KERNEL : X86_BR_USER;
+		/*
+		 * If type does not correspond, then discard.
+		 * Especially filter out the entries whose from or to address
+		 * is a kernel address while only X86_BR_USER is set. This prevents
+		 * kernel address from being leaked into a user-space-only LBR stack.
+		 */
+		if ((type & ~X86_BR_PLM) == X86_BR_NONE || (br_sel & type) != type ||
+		    (!(br_sel & X86_BR_KERNEL) && (from_plm & X86_BR_KERNEL))) {
 			cpuc->lbr_entries[i].from = 0;
 			compress = true;
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/8] perf/x86/intel: Validate return value of intel_pmu_init_hybrid()
  2026-06-05  1:11 [PATCH 0/8] perf/x86: Miscellaneous PMU bug fixes Dapeng Mi
                   ` (4 preceding siblings ...)
  2026-06-05  1:11 ` [PATCH 5/8] perf/x86/intel: Fix kernel address leakages in LBR stack Dapeng Mi
@ 2026-06-05  1:11 ` Dapeng Mi
  2026-06-05  1:36   ` sashiko-bot
  2026-06-05  1:11 ` [PATCH 7/8] perf/x86/intel: Drop fixed-counter PEBS constraints for baseline PEBS Dapeng Mi
  2026-06-05  1:11 ` [PATCH 8/8] perf/core: Fix kernel register info leak via hardware skid Dapeng Mi
  7 siblings, 1 reply; 15+ messages in thread
From: Dapeng Mi @ 2026-06-05  1:11 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen,
	Falcon Thomas, Xudong Hao, Dapeng Mi

The memory allocation for the x86_pmu.hybrid_pmu[] array in
intel_pmu_init_hybrid() can theoretically fail due to memory shortages.
If this occurs, the initialization of the x86 hybrid PMU would fail.

Currently, the code does not check the return value of the
intel_pmu_init_hybrid() function, which could lead to attempts to access
the uninitialized x86_pmu.hybrid_pmu[] array, potentially causing a
system panic.

So, adds a check for the return value of intel_pmu_init_hybrid() to
prevent invalid memory access in such scenarios. Besides, free the
created kmem cache when error occurs.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 arch/x86/events/intel/core.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index ea3ab3050a3b..efd9caa3502c 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -7870,6 +7870,7 @@ __init int intel_pmu_init(void)
 	int version, i;
 	char *name;
 	struct x86_hybrid_pmu *pmu;
+	int ret;
 
 	/* Architectural Perfmon was introduced starting with Core "Yonah" */
 	if (!cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
@@ -8539,7 +8540,9 @@ __init int intel_pmu_init(void)
 		 *
 		 * Initialize the common PerfMon capabilities here.
 		 */
-		intel_pmu_init_hybrid(hybrid_big_small);
+		ret = intel_pmu_init_hybrid(hybrid_big_small);
+		if (ret < 0)
+			goto err;
 
 		x86_pmu.pebs_latency_data = grt_latency_data;
 		x86_pmu.get_event_constraints = adl_get_event_constraints;
@@ -8597,7 +8600,9 @@ __init int intel_pmu_init(void)
 	case INTEL_METEORLAKE:
 	case INTEL_METEORLAKE_L:
 	case INTEL_ARROWLAKE_U:
-		intel_pmu_init_hybrid(hybrid_big_small);
+		ret = intel_pmu_init_hybrid(hybrid_big_small);
+		if (ret < 0)
+			goto err;
 
 		x86_pmu.pebs_latency_data = cmt_latency_data;
 		x86_pmu.get_event_constraints = mtl_get_event_constraints;
@@ -8628,7 +8633,9 @@ __init int intel_pmu_init(void)
 		pr_cont("Pantherlake Hybrid events, ");
 		name = "pantherlake_hybrid";
 
-		intel_pmu_init_hybrid(hybrid_big_small);
+		ret = intel_pmu_init_hybrid(hybrid_big_small);
+		if (ret < 0)
+			goto err;
 
 		/* Initialize big core specific PerfMon capabilities.*/
 		pmu = &x86_pmu.hybrid_pmu[X86_HYBRID_PMU_CORE_IDX];
@@ -8643,7 +8650,9 @@ __init int intel_pmu_init(void)
 		pr_cont("Arrowlake Hybrid events, ");
 		name = "arrowlake_hybrid";
 
-		intel_pmu_init_hybrid(hybrid_big_small);
+		ret = intel_pmu_init_hybrid(hybrid_big_small);
+		if (ret < 0)
+			goto err;
 
 		/* Initialize big core specific PerfMon capabilities.*/
 		pmu = &x86_pmu.hybrid_pmu[X86_HYBRID_PMU_CORE_IDX];
@@ -8660,7 +8669,9 @@ __init int intel_pmu_init(void)
 		pr_cont("Lunarlake Hybrid events, ");
 		name = "lunarlake_hybrid";
 
-		intel_pmu_init_hybrid(hybrid_big_small);
+		ret = intel_pmu_init_hybrid(hybrid_big_small);
+		if (ret < 0)
+			goto err;
 
 		/* Initialize big core specific PerfMon capabilities.*/
 		pmu = &x86_pmu.hybrid_pmu[X86_HYBRID_PMU_CORE_IDX];
@@ -8685,7 +8696,9 @@ __init int intel_pmu_init(void)
 		break;
 
 	case INTEL_ARROWLAKE_H:
-		intel_pmu_init_hybrid(hybrid_big_small_tiny);
+		ret = intel_pmu_init_hybrid(hybrid_big_small_tiny);
+		if (ret < 0)
+			goto err;
 
 		x86_pmu.pebs_latency_data = arl_h_latency_data;
 		x86_pmu.get_event_constraints = arl_h_get_event_constraints;
@@ -8720,7 +8733,9 @@ __init int intel_pmu_init(void)
 	case INTEL_NOVALAKE_L:
 		pr_cont("Novalake Hybrid events, ");
 		name = "novalake_hybrid";
-		intel_pmu_init_hybrid(hybrid_big_small);
+		ret = intel_pmu_init_hybrid(hybrid_big_small);
+		if (ret < 0)
+			goto err;
 
 		x86_pmu.pebs_latency_data = nvl_latency_data;
 		x86_pmu.get_event_constraints = mtl_get_event_constraints;
@@ -8885,6 +8900,10 @@ __init int intel_pmu_init(void)
 	intel_aux_output_init();
 
 	return 0;
+
+err:
+	kmem_cache_destroy(x86_get_pmu(smp_processor_id())->task_ctx_cache);
+	return ret;
 }
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/8] perf/x86/intel: Drop fixed-counter PEBS constraints for baseline PEBS
  2026-06-05  1:11 [PATCH 0/8] perf/x86: Miscellaneous PMU bug fixes Dapeng Mi
                   ` (5 preceding siblings ...)
  2026-06-05  1:11 ` [PATCH 6/8] perf/x86/intel: Validate return value of intel_pmu_init_hybrid() Dapeng Mi
@ 2026-06-05  1:11 ` Dapeng Mi
  2026-06-05  1:11 ` [PATCH 8/8] perf/core: Fix kernel register info leak via hardware skid Dapeng Mi
  7 siblings, 0 replies; 15+ messages in thread
From: Dapeng Mi @ 2026-06-05  1:11 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen,
	Falcon Thomas, Xudong Hao, Dapeng Mi, Yi Lai

On SPR guests where pebs_baseline is not advertised, running:

$ ./perf record -e cpu/event=0x00,umask=0x01,i\
	 name=INST_RETIRED.PREC_DIST/p -c 10000 sleep 1

can trigger:

unchecked MSR access error: WRMSR to 0x3f1 ... in\
	 intel_pmu_pebs_enable_all()

Root cause:
SPR-specific PEBS constraints allow fixed-counter scheduling,
for example INST_RETIRED.PREC_DIST on fixed counter 0. In guests without
pebs_baseline, KVM does not support PEBS sampling on fixed counters,
so enabling such events reaches an invalid MSR programming path.

Fix:
Drop fixed-counter entries from the PEBS constraint table. Without
pebs_baseline, those fixed-counter PEBS events now resolve to empty
constraints and are not scheduled/enabled, avoiding the warning and the
broken guest PEBS path.

This is safe because, in pebs_baseline-capable cases, PEBS constraint
lookup already falls back to non-PEBS constraints when needed, and
fixed-counter constraints are effectively shared there.

Reported-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 arch/x86/events/intel/ds.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index cb72af9b61ce..5db15a92017a 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1447,10 +1447,6 @@ struct event_constraint intel_skl_pebs_event_constraints[] = {
 };
 
 struct event_constraint intel_icl_pebs_event_constraints[] = {
-	INTEL_FLAGS_UEVENT_CONSTRAINT(0x01c0, 0x100000000ULL),	/* old INST_RETIRED.PREC_DIST */
-	INTEL_FLAGS_UEVENT_CONSTRAINT(0x0100, 0x100000000ULL),	/* INST_RETIRED.PREC_DIST */
-	INTEL_FLAGS_UEVENT_CONSTRAINT(0x0400, 0x800000000ULL),	/* SLOTS */
-
 	INTEL_PLD_CONSTRAINT(0x1cd, 0xff),			/* MEM_TRANS_RETIRED.LOAD_LATENCY */
 	INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf),	/* MEM_INST_RETIRED.STLB_MISS_LOADS */
 	INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf),	/* MEM_INST_RETIRED.STLB_MISS_STORES */
@@ -1473,9 +1469,6 @@ struct event_constraint intel_icl_pebs_event_constraints[] = {
 };
 
 struct event_constraint intel_glc_pebs_event_constraints[] = {
-	INTEL_FLAGS_UEVENT_CONSTRAINT(0x100, 0x100000000ULL),	/* INST_RETIRED.PREC_DIST */
-	INTEL_FLAGS_UEVENT_CONSTRAINT(0x0400, 0x800000000ULL),
-
 	INTEL_FLAGS_EVENT_CONSTRAINT(0xc0, 0xfe),
 	INTEL_PLD_CONSTRAINT(0x1cd, 0xfe),
 	INTEL_PSD_CONSTRAINT(0x2cd, 0x1),
@@ -1500,9 +1493,6 @@ struct event_constraint intel_glc_pebs_event_constraints[] = {
 };
 
 struct event_constraint intel_lnc_pebs_event_constraints[] = {
-	INTEL_FLAGS_UEVENT_CONSTRAINT(0x100, 0x100000000ULL),	/* INST_RETIRED.PREC_DIST */
-	INTEL_FLAGS_UEVENT_CONSTRAINT(0x0400, 0x800000000ULL),
-
 	INTEL_FLAGS_UEVENT_CONSTRAINT(0x012a, 0x1),		/* OCR.* events */
 	INTEL_FLAGS_UEVENT_CONSTRAINT(0x012b, 0x1),		/* OCR.* events */
 
@@ -1534,9 +1524,6 @@ struct event_constraint intel_lnc_pebs_event_constraints[] = {
 };
 
 struct event_constraint intel_pnc_pebs_event_constraints[] = {
-	INTEL_FLAGS_UEVENT_CONSTRAINT(0x100, 0x100000000ULL),	/* INST_RETIRED.PREC_DIST */
-	INTEL_FLAGS_UEVENT_CONSTRAINT(0x0400, 0x800000000ULL),
-
 	INTEL_HYBRID_LDLAT_CONSTRAINT(0x1cd, 0xfc),
 	INTEL_HYBRID_STLAT_CONSTRAINT(0x2cd, 0x3),
 	INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf),	/* MEM_INST_RETIRED.STLB_MISS_LOADS */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 8/8] perf/core: Fix kernel register info leak via hardware skid
  2026-06-05  1:11 [PATCH 0/8] perf/x86: Miscellaneous PMU bug fixes Dapeng Mi
                   ` (6 preceding siblings ...)
  2026-06-05  1:11 ` [PATCH 7/8] perf/x86/intel: Drop fixed-counter PEBS constraints for baseline PEBS Dapeng Mi
@ 2026-06-05  1:11 ` Dapeng Mi
  2026-06-05  1:38   ` sashiko-bot
  7 siblings, 1 reply; 15+ messages in thread
From: Dapeng Mi @ 2026-06-05  1:11 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen,
	Falcon Thomas, Xudong Hao, Dapeng Mi, Mark Rutland

An unprivileged hardware perf event using exclude_kernel=1 can leak kernel
register data to user space via PERF_SAMPLE_REGS_INTR. Due to hardware
skid, a PMI may trigger after the CPU has already entered kernel space
(Ring 0), bypassing the perf_allow_kernel() privilege barrier.

This security vulnerability is severely exacerbated by upcoming support
for SIMD register sampling via XSAVES, which could expose sensitive kernel
FPU states (such as active cryptographic keys).

Fix this by ensuring that sampled register data is dropped if the event's
exclude_kernel attribute is set but the PMI catches the CPU in kernel mode.

Link: https://lore.kernel.org/all/20260529085613.CCAFB1F00893@smtp.kernel.org/
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 kernel/events/core.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7935d5663944..b7326bc3acd0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7800,10 +7800,21 @@ static void perf_sample_regs_user(struct perf_regs *regs_user,
 }
 
 static void perf_sample_regs_intr(struct perf_regs *regs_intr,
-				  struct pt_regs *regs)
+				  struct pt_regs *regs,
+				  bool exclude_kernel)
 {
-	regs_intr->regs = regs;
-	regs_intr->abi  = perf_reg_abi(current);
+	/*
+	 * Hardware skid can lead to PMI is delivered after
+	 * the CPU has already entered kernel mode. In that case,
+	 * user-space sampling must not expose kernel register state.
+	 */
+	if (exclude_kernel && !user_mode(regs)) {
+		regs_intr->abi = PERF_SAMPLE_REGS_ABI_NONE;
+		regs_intr->regs = NULL;
+	} else {
+		regs_intr->regs = regs;
+		regs_intr->abi = perf_reg_abi(current);
+	}
 }
 
 
@@ -8694,7 +8705,8 @@ void perf_prepare_sample(struct perf_sample_data *data,
 		/* regs dump ABI info */
 		int size = sizeof(u64);
 
-		perf_sample_regs_intr(&data->regs_intr, regs);
+		perf_sample_regs_intr(&data->regs_intr, regs,
+				      event->attr.exclude_kernel);
 
 		if (data->regs_intr.regs) {
 			u64 mask = event->attr.sample_regs_intr;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/8] perf/x86/intel: Fix kernel address leakages in LBR stack
  2026-06-05  1:11 ` [PATCH 5/8] perf/x86/intel: Fix kernel address leakages in LBR stack Dapeng Mi
@ 2026-06-05  1:33   ` sashiko-bot
  2026-06-05  3:20     ` Mi, Dapeng
  0 siblings, 1 reply; 15+ messages in thread
From: sashiko-bot @ 2026-06-05  1:33 UTC (permalink / raw)
  To: Dapeng Mi; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

Pre-existing issues:
- [High] The exact same kernel address leak (KASLR bypass) vulnerability exists in the AMD LBR software filter (`amd_pmu_lbr_filter`), but the patch only fixes the Intel implementation.
- [High] On Arch LBR CPUs that do not support hardware branch type decoding, the software filter incorrectly classifies all branches as conditional jumps (JCC).
- [Medium] The software filter fails to drop branches executing in user space (e.g., SYSCALL) when only kernel branches are requested, diverging from hardware filter behavior.
--

commit e5f3b8c6179b0acb53df2e9f20348cf3b8f5a8be
Author: Dapeng Mi <dapeng1.mi@linux.intel.com>

perf/x86/intel: Fix kernel address leakages in LBR stack

This commit fixes a kernel address leakage in the LBR stack on Intel
platforms prior to arch-LBR. It extends the software filtering in
intel_pmu_lbr_filter() to check the "from" address privilege level,
preventing kernel addresses from leaking to user space.

> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
> index 16977e4c6f8a5..deef81c16571a 100644
> --- a/arch/x86/events/intel/lbr.c
> +++ b/arch/x86/events/intel/lbr.c
> @@ -1212,7 +1212,7 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
>  {
>  	u64 from, to;
>  	int br_sel = cpuc->br_sel;
> -	int i, j, type, to_plm;
> +	int i, j, type, to_plm, from_plm;
>  	bool compress = false;
>  
>  	/* if sampling all branches, then nothing to filter */

[Severity: High]
This is a pre-existing issue, but does this function correctly handle Arch
LBR CPUs that do not support hardware branch type decoding?

If an Arch LBR CPU does not support hardware branch type decoding
(LBR_INFO.BR_TYPE), the x86_lbr_type static key remains false and
get_lbr_br_type() returns 0.

Earlier in intel_pmu_lbr_filter(), the code checks:

    if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
        type <= ARCH_LBR_BR_TYPE_KNOWN_MAX) {

Since 0 <= 5 is always true, it maps arch_lbr_br_type_map[0] to X86_BR_JCC,
bypassing the software decoding fallback in branch_type().

Does this force all logged branches to be treated as conditional jumps on
these processors, destroying branch filtering accuracy?

[ ... ]

> @@ -1244,8 +1244,15 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
>  				type |= X86_BR_NO_TX;
>  		}
>  
> -		/* if type does not correspond, then discard */
> -		if ((type & ~X86_BR_PLM) == X86_BR_NONE || (br_sel & type) != type) {
> +		from_plm = kernel_ip(from) ? X86_BR_KERNEL : X86_BR_USER;
> +		/*
> +		 * If type does not correspond, then discard.
> +		 * Especially filter out the entries whose from or to address
> +		 * is a kernel address while only X86_BR_USER is set. This prevents
> +		 * kernel address from being leaked into a user-space-only LBR stack.
> +		 */
> +		if ((type & ~X86_BR_PLM) == X86_BR_NONE || (br_sel & type) != type ||
> +		    (!(br_sel & X86_BR_KERNEL) && (from_plm & X86_BR_KERNEL))) {

[Severity: Medium]
This is a pre-existing issue, but does this logic miss the symmetric check
for scrubbing user-to-kernel branches from kernel-only samples?

If a user requests only kernel branches (X86_BR_USER is not set), a SYSCALL
(which goes from user to kernel) will be retained because the hardware
filters based on destination CPL (kernel) and the software filter sees a
valid to_plm.

Should there be a check like:

    (!(br_sel & X86_BR_USER) && (from_plm & X86_BR_USER))

to prevent user-space addresses from polluting kernel-only branch traces?


[Severity: High]
This isn't a bug introduced by this patch, but does the exact same KASLR
bypass vulnerability exist in the AMD LBR software filter?

AMD's software filter in arch/x86/events/amd/lbr.c:amd_pmu_lbr_filter()
uses branch_type_fused(), which only factors in the destination privilege
level.

When a user requests only user-space branches (PERF_SAMPLE_BRANCH_USER),
hardware can still record kernel-to-user branches (like SYSRET) because
the destination is in user space.

Since the AMD filter evaluates (br_sel & type) == type as true based only
on the destination, it keeps the branch, exposing the kernel "from" address
to unprivileged userspace.

Should a similar fix be applied to amd_pmu_lbr_filter()?

>  			cpuc->lbr_entries[i].from = 0;
>  			compress = true;
>  		}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260605011136.2043393-1-dapeng1.mi@linux.intel.com?part=5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/8] perf/x86/intel: Validate return value of intel_pmu_init_hybrid()
  2026-06-05  1:11 ` [PATCH 6/8] perf/x86/intel: Validate return value of intel_pmu_init_hybrid() Dapeng Mi
@ 2026-06-05  1:36   ` sashiko-bot
  2026-06-05  3:29     ` Mi, Dapeng
  0 siblings, 1 reply; 15+ messages in thread
From: sashiko-bot @ 2026-06-05  1:36 UTC (permalink / raw)
  To: Dapeng Mi; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Unnecessary error handling and dead cleanup code for early boot memory allocation failure.
--

commit e6fc94eb7f63fa466e8e1aeaed9a0d18231a88dc
Author: Dapeng Mi <dapeng1.mi@linux.intel.com>

perf/x86/intel: Validate return value of intel_pmu_init_hybrid()

This commit validates the return value of intel_pmu_init_hybrid() to prevent
attempts to access the uninitialized x86_pmu.hybrid_pmu[] array due to
allocation failure. It adds error paths that destroy the task_ctx_cache when
an error occurs.

> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index ea3ab3050a3b1..efd9caa3502c5 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -8539,7 +8540,9 @@ __init int intel_pmu_init(void)
>  		 *
>  		 * Initialize the common PerfMon capabilities here.
>  		 */
> -		intel_pmu_init_hybrid(hybrid_big_small);
> +		ret = intel_pmu_init_hybrid(hybrid_big_small);
> +		if (ret < 0)
> +			goto err;

[Severity: Low]
Is this error handling necessary for an early boot function?

Since intel_pmu_init() in arch/x86/events/intel/core.c is marked with __init,
it executes only during early boot. At this stage, if a physical memory
allocation fails, the system typically cannot boot anyway. Handling memory
allocation failures and returning -ENOMEM here is likely dead code.

[ ... ]

> @@ -8885,6 +8900,10 @@ __init int intel_pmu_init(void)
>  	intel_aux_output_init();
>  
>  	return 0;
> +
> +err:
> +	kmem_cache_destroy(x86_get_pmu(smp_processor_id())->task_ctx_cache);

[Severity: Low]
Will this cleanup code in intel_pmu_init() actually free anything?

If the allocation fails during intel_pmu_init_hybrid() and we jump to the err
label, task_ctx_cache has not been allocated yet. It is only allocated later
in the function via intel_pmu_lbr_init() or intel_pmu_arch_lbr_init().

This guarantees task_ctx_cache is NULL when this error path is taken, meaning
this call to kmem_cache_destroy() acts as a no-op.

> +	return ret;
>  }
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260605011136.2043393-1-dapeng1.mi@linux.intel.com?part=6

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 8/8] perf/core: Fix kernel register info leak via hardware skid
  2026-06-05  1:11 ` [PATCH 8/8] perf/core: Fix kernel register info leak via hardware skid Dapeng Mi
@ 2026-06-05  1:38   ` sashiko-bot
  2026-06-05  3:42     ` Mi, Dapeng
  0 siblings, 1 reply; 15+ messages in thread
From: sashiko-bot @ 2026-06-05  1:38 UTC (permalink / raw)
  To: Dapeng Mi; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] Valid hardware-captured user-space register samples from precise events (e.g., Intel PEBS) are incorrectly dropped as sensitive kernel data when the PMU interrupt (PMI) skids into kernel mode.

Pre-existing issues:
- [High] Hardware PMI skid still leaks kernel IPs and callchains to unprivileged users when exclude_kernel=1.
--

commit 8a7df272a8b8797c3525f28de5158f11f610df4d
Author: Dapeng Mi <dapeng1.mi@linux.intel.com>

perf/core: Fix kernel register info leak via hardware skid
    
This patch addresses a security vulnerability where hardware skid can
cause a PMI to trigger in kernel space, leaking kernel register data
to user space via PERF_SAMPLE_REGS_INTR. It fixes this by dropping
sampled register data if the event has exclude_kernel set but the CPU
is in kernel mode.
    
Link: https://lore.kernel.org/all/20260529085613.CCAFB1F00893@smtp.kernel.org/

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 7935d5663944e..b7326bc3acd00 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7800,10 +7800,21 @@ static void perf_sample_regs_user(struct perf_regs *regs_user,
>  }
>  
>  static void perf_sample_regs_intr(struct perf_regs *regs_intr,
> -				  struct pt_regs *regs)
> +				  struct pt_regs *regs,
> +				  bool exclude_kernel)
>  {
> -	regs_intr->regs = regs;
> -	regs_intr->abi  = perf_reg_abi(current);
> +	/*
> +	 * Hardware skid can lead to PMI is delivered after
> +	 * the CPU has already entered kernel mode. In that case,
> +	 * user-space sampling must not expose kernel register state.
> +	 */
> +	if (exclude_kernel && !user_mode(regs)) {

[Severity: High]
Will this incorrectly drop valid hardware-captured user-space register
samples from precise events like Intel PEBS?

For precise events, the PMU driver populates a temporary pt_regs
with exact user-space registers captured by the hardware. Because PEBS
does not capture segment registers, the driver copies them from the
delayed PMI interrupt context:

arch/x86/events/intel/ds.c:setup_pebs_fixed_sample_data() {
    *regs = *iregs;
    ...
}

If the PMI skids and is delivered after the CPU enters kernel space,
iregs->cs reflects kernel mode, causing user_mode(regs) to evaluate
to false. This would result in these safe, hardware-captured user
registers being zeroed out here, which seems like it might break
unprivileged precise profiling.

> +		regs_intr->abi = PERF_SAMPLE_REGS_ABI_NONE;
> +		regs_intr->regs = NULL;
> +	} else {
> +		regs_intr->regs = regs;
> +		regs_intr->abi = perf_reg_abi(current);
> +	}
>  }

[ ... ]

> @@ -8694,7 +8705,8 @@ void perf_prepare_sample(struct perf_sample_data *data,
>  		/* regs dump ABI info */
>  		int size = sizeof(u64);
>  
> -		perf_sample_regs_intr(&data->regs_intr, regs);
> +		perf_sample_regs_intr(&data->regs_intr, regs,
> +				      event->attr.exclude_kernel);

[Severity: High]
This is a pre-existing issue, but does the hardware PMI skid still leak
kernel IPs and callchains to unprivileged users when exclude_kernel=1?

While this patch correctly zeroes out the register state to prevent
leaking the full CPU register context, it appears the same privilege
boundaries are not applied to PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN.

In perf_prepare_sample(), the kernel IP is unconditionally extracted
from regs if the PMI skid caught the CPU in kernel mode:

kernel/events/core.c:perf_prepare_sample() {
    if (filtered_sample_type & PERF_SAMPLE_IP) {
        data->ip = perf_instruction_pointer(event, regs);
        ...
    }
}

Similarly, unprivileged users can set exclude_kernel=1 and leave
exclude_callchain_kernel=0. On a skid, !user_mode(regs) evaluates to
true, causing perf_callchain() to unwind and expose the kernel stack
trace.

Could this allow a deterministic KASLR bypass, and should it be
addressed alongside the register clearing?

>  
>  		if (data->regs_intr.regs) {
>  			u64 mask = event->attr.sample_regs_intr;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260605011136.2043393-1-dapeng1.mi@linux.intel.com?part=8

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/8] perf/x86/intel: Fix kernel address leakages in LBR stack
  2026-06-05  1:33   ` sashiko-bot
@ 2026-06-05  3:20     ` Mi, Dapeng
  0 siblings, 0 replies; 15+ messages in thread
From: Mi, Dapeng @ 2026-06-05  3:20 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users


On 6/5/2026 9:33 AM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
>
> Pre-existing issues:
> - [High] The exact same kernel address leak (KASLR bypass) vulnerability exists in the AMD LBR software filter (`amd_pmu_lbr_filter`), but the patch only fixes the Intel implementation.
> - [High] On Arch LBR CPUs that do not support hardware branch type decoding, the software filter incorrectly classifies all branches as conditional jumps (JCC).
> - [Medium] The software filter fails to drop branches executing in user space (e.g., SYSCALL) when only kernel branches are requested, diverging from hardware filter behavior.
> --
>
> commit e5f3b8c6179b0acb53df2e9f20348cf3b8f5a8be
> Author: Dapeng Mi <dapeng1.mi@linux.intel.com>
>
> perf/x86/intel: Fix kernel address leakages in LBR stack
>
> This commit fixes a kernel address leakage in the LBR stack on Intel
> platforms prior to arch-LBR. It extends the software filtering in
> intel_pmu_lbr_filter() to check the "from" address privilege level,
> preventing kernel addresses from leaking to user space.
>
>> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
>> index 16977e4c6f8a5..deef81c16571a 100644
>> --- a/arch/x86/events/intel/lbr.c
>> +++ b/arch/x86/events/intel/lbr.c
>> @@ -1212,7 +1212,7 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
>>  {
>>  	u64 from, to;
>>  	int br_sel = cpuc->br_sel;
>> -	int i, j, type, to_plm;
>> +	int i, j, type, to_plm, from_plm;
>>  	bool compress = false;
>>  
>>  	/* if sampling all branches, then nothing to filter */
> [Severity: High]
> This is a pre-existing issue, but does this function correctly handle Arch
> LBR CPUs that do not support hardware branch type decoding?
>
> If an Arch LBR CPU does not support hardware branch type decoding
> (LBR_INFO.BR_TYPE), the x86_lbr_type static key remains false and
> get_lbr_br_type() returns 0.
>
> Earlier in intel_pmu_lbr_filter(), the code checks:
>
>     if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
>         type <= ARCH_LBR_BR_TYPE_KNOWN_MAX) {
>
> Since 0 <= 5 is always true, it maps arch_lbr_br_type_map[0] to X86_BR_JCC,
> bypassing the software decoding fallback in branch_type().
>
> Does this force all logged branches to be treated as conditional jumps on
> these processors, destroying branch filtering accuracy?

Yes, it's an issue. If arch-LBR can't support hardware branch type
decoding, it should fallback to software branch type decoding, like this,

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index deef81c16571..fcdbb6feda73 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1232,6 +1232,7 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
                 * OTHER_BRANCH branch type still rely on software decoding.
                 */
                if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
+                   static_branch_likely(&x86_lbr_type) &&
                    type <= ARCH_LBR_BR_TYPE_KNOWN_MAX) {
                        to_plm = kernel_ip(to) ? X86_BR_KERNEL : X86_BR_USER;
                        type = arch_lbr_br_type_map[type] | to_plm;


>
> [ ... ]
>
>> @@ -1244,8 +1244,15 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
>>  				type |= X86_BR_NO_TX;
>>  		}
>>  
>> -		/* if type does not correspond, then discard */
>> -		if ((type & ~X86_BR_PLM) == X86_BR_NONE || (br_sel & type) != type) {
>> +		from_plm = kernel_ip(from) ? X86_BR_KERNEL : X86_BR_USER;
>> +		/*
>> +		 * If type does not correspond, then discard.
>> +		 * Especially filter out the entries whose from or to address
>> +		 * is a kernel address while only X86_BR_USER is set. This prevents
>> +		 * kernel address from being leaked into a user-space-only LBR stack.
>> +		 */
>> +		if ((type & ~X86_BR_PLM) == X86_BR_NONE || (br_sel & type) != type ||
>> +		    (!(br_sel & X86_BR_KERNEL) && (from_plm & X86_BR_KERNEL))) {
> [Severity: Medium]
> This is a pre-existing issue, but does this logic miss the symmetric check
> for scrubbing user-to-kernel branches from kernel-only samples?
>
> If a user requests only kernel branches (X86_BR_USER is not set), a SYSCALL
> (which goes from user to kernel) will be retained because the hardware
> filters based on destination CPL (kernel) and the software filter sees a
> valid to_plm.
>
> Should there be a check like:
>
>     (!(br_sel & X86_BR_USER) && (from_plm & X86_BR_USER))
>
> to prevent user-space addresses from polluting kernel-only branch traces?

Yes, this can be enhanced.


>
>
> [Severity: High]
> This isn't a bug introduced by this patch, but does the exact same KASLR
> bypass vulnerability exist in the AMD LBR software filter?
>
> AMD's software filter in arch/x86/events/amd/lbr.c:amd_pmu_lbr_filter()
> uses branch_type_fused(), which only factors in the destination privilege
> level.
>
> When a user requests only user-space branches (PERF_SAMPLE_BRANCH_USER),
> hardware can still record kernel-to-user branches (like SYSRET) because
> the destination is in user space.
>
> Since the AMD filter evaluates (br_sel & type) == type as true based only
> on the destination, it keeps the branch, exposing the kernel "from" address
> to unprivileged userspace.
>
> Should a similar fix be applied to amd_pmu_lbr_filter()?

AMD guys have known this issue, the AMD specific fix would posted separately.

Thanks.


>
>>  			cpuc->lbr_entries[i].from = 0;
>>  			compress = true;
>>  		}

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/8] perf/x86/intel: Validate return value of intel_pmu_init_hybrid()
  2026-06-05  1:36   ` sashiko-bot
@ 2026-06-05  3:29     ` Mi, Dapeng
  0 siblings, 0 replies; 15+ messages in thread
From: Mi, Dapeng @ 2026-06-05  3:29 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users


On 6/5/2026 9:36 AM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Low] Unnecessary error handling and dead cleanup code for early boot memory allocation failure.
> --
>
> commit e6fc94eb7f63fa466e8e1aeaed9a0d18231a88dc
> Author: Dapeng Mi <dapeng1.mi@linux.intel.com>
>
> perf/x86/intel: Validate return value of intel_pmu_init_hybrid()
>
> This commit validates the return value of intel_pmu_init_hybrid() to prevent
> attempts to access the uninitialized x86_pmu.hybrid_pmu[] array due to
> allocation failure. It adds error paths that destroy the task_ctx_cache when
> an error occurs.
>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index ea3ab3050a3b1..efd9caa3502c5 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -8539,7 +8540,9 @@ __init int intel_pmu_init(void)
>>  		 *
>>  		 * Initialize the common PerfMon capabilities here.
>>  		 */
>> -		intel_pmu_init_hybrid(hybrid_big_small);
>> +		ret = intel_pmu_init_hybrid(hybrid_big_small);
>> +		if (ret < 0)
>> +			goto err;
> [Severity: Low]
> Is this error handling necessary for an early boot function?
>
> Since intel_pmu_init() in arch/x86/events/intel/core.c is marked with __init,
> it executes only during early boot. At this stage, if a physical memory
> allocation fails, the system typically cannot boot anyway. Handling memory
> allocation failures and returning -ENOMEM here is likely dead code.

Even it's in early boot, it's better not to trigger an invalid memory
access error which could cause misleading.


>
> [ ... ]
>
>> @@ -8885,6 +8900,10 @@ __init int intel_pmu_init(void)
>>  	intel_aux_output_init();
>>  
>>  	return 0;
>> +
>> +err:
>> +	kmem_cache_destroy(x86_get_pmu(smp_processor_id())->task_ctx_cache);
> [Severity: Low]
> Will this cleanup code in intel_pmu_init() actually free anything?
>
> If the allocation fails during intel_pmu_init_hybrid() and we jump to the err
> label, task_ctx_cache has not been allocated yet. It is only allocated later
> in the function via intel_pmu_lbr_init() or intel_pmu_arch_lbr_init().
>
> This guarantees task_ctx_cache is NULL when this error path is taken, meaning
> this call to kmem_cache_destroy() acts as a no-op.

No, intel_pmu_arch_lbr_init() which is called before
intel_pmu_init_hybrid() could create the LBR needed slab cache
"task_ctx_cache". If intel_pmu_init_hybrid() returns error, the created
slab cache needs to be destroyed.

Thanks.

>
>> +	return ret;
>>  }
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 8/8] perf/core: Fix kernel register info leak via hardware skid
  2026-06-05  1:38   ` sashiko-bot
@ 2026-06-05  3:42     ` Mi, Dapeng
  0 siblings, 0 replies; 15+ messages in thread
From: Mi, Dapeng @ 2026-06-05  3:42 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users


On 6/5/2026 9:38 AM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
>
> New issues:
> - [High] Valid hardware-captured user-space register samples from precise events (e.g., Intel PEBS) are incorrectly dropped as sensitive kernel data when the PMU interrupt (PMI) skids into kernel mode.
>
> Pre-existing issues:
> - [High] Hardware PMI skid still leaks kernel IPs and callchains to unprivileged users when exclude_kernel=1.
> --
>
> commit 8a7df272a8b8797c3525f28de5158f11f610df4d
> Author: Dapeng Mi <dapeng1.mi@linux.intel.com>
>
> perf/core: Fix kernel register info leak via hardware skid
>     
> This patch addresses a security vulnerability where hardware skid can
> cause a PMI to trigger in kernel space, leaking kernel register data
> to user space via PERF_SAMPLE_REGS_INTR. It fixes this by dropping
> sampled register data if the event has exclude_kernel set but the CPU
> is in kernel mode.
>     
> Link: https://lore.kernel.org/all/20260529085613.CCAFB1F00893@smtp.kernel.org/
>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 7935d5663944e..b7326bc3acd00 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -7800,10 +7800,21 @@ static void perf_sample_regs_user(struct perf_regs *regs_user,
>>  }
>>  
>>  static void perf_sample_regs_intr(struct perf_regs *regs_intr,
>> -				  struct pt_regs *regs)
>> +				  struct pt_regs *regs,
>> +				  bool exclude_kernel)
>>  {
>> -	regs_intr->regs = regs;
>> -	regs_intr->abi  = perf_reg_abi(current);
>> +	/*
>> +	 * Hardware skid can lead to PMI is delivered after
>> +	 * the CPU has already entered kernel mode. In that case,
>> +	 * user-space sampling must not expose kernel register state.
>> +	 */
>> +	if (exclude_kernel && !user_mode(regs)) {
> [Severity: High]
> Will this incorrectly drop valid hardware-captured user-space register
> samples from precise events like Intel PEBS?
>
> For precise events, the PMU driver populates a temporary pt_regs
> with exact user-space registers captured by the hardware. Because PEBS
> does not capture segment registers, the driver copies them from the
> delayed PMI interrupt context:
>
> arch/x86/events/intel/ds.c:setup_pebs_fixed_sample_data() {
>     *regs = *iregs;
>     ...
> }
>
> If the PMI skids and is delivered after the CPU enters kernel space,
> iregs->cs reflects kernel mode, causing user_mode(regs) to evaluate
> to false. This would result in these safe, hardware-captured user
> registers being zeroed out here, which seems like it might break
> unprivileged precise profiling.

Yes, it's true. But since PEBS has no way to sample CS register and the
value of CS register can only come from the data of PMI. It seems no better
way except dropping all these register data from PEBS records. 


>
>> +		regs_intr->abi = PERF_SAMPLE_REGS_ABI_NONE;
>> +		regs_intr->regs = NULL;
>> +	} else {
>> +		regs_intr->regs = regs;
>> +		regs_intr->abi = perf_reg_abi(current);
>> +	}
>>  }
> [ ... ]
>
>> @@ -8694,7 +8705,8 @@ void perf_prepare_sample(struct perf_sample_data *data,
>>  		/* regs dump ABI info */
>>  		int size = sizeof(u64);
>>  
>> -		perf_sample_regs_intr(&data->regs_intr, regs);
>> +		perf_sample_regs_intr(&data->regs_intr, regs,
>> +				      event->attr.exclude_kernel);
> [Severity: High]
> This is a pre-existing issue, but does the hardware PMI skid still leak
> kernel IPs and callchains to unprivileged users when exclude_kernel=1?
>
> While this patch correctly zeroes out the register state to prevent
> leaking the full CPU register context, it appears the same privilege
> boundaries are not applied to PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN.
>
> In perf_prepare_sample(), the kernel IP is unconditionally extracted
> from regs if the PMI skid caught the CPU in kernel mode:
>
> kernel/events/core.c:perf_prepare_sample() {
>     if (filtered_sample_type & PERF_SAMPLE_IP) {
>         data->ip = perf_instruction_pointer(event, regs);
>         ...
>     }
> }
>
> Similarly, unprivileged users can set exclude_kernel=1 and leave
> exclude_callchain_kernel=0. On a skid, !user_mode(regs) evaluates to
> true, causing perf_callchain() to unwind and expose the kernel stack
> trace.
>
> Could this allow a deterministic KASLR bypass, and should it be
> addressed alongside the register clearing?

Hmm, make sense. Need extra changes to prevent these 2 leakages.

Thanks.


>
>>  
>>  		if (data->regs_intr.regs) {
>>  			u64 mask = event->attr.sample_regs_intr;

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-06-05  3:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-05  1:11 [PATCH 0/8] perf/x86: Miscellaneous PMU bug fixes Dapeng Mi
2026-06-05  1:11 ` [PATCH 1/8] perf/x86/intel: Remove anythread_deprecated bit from perf_capabilities Dapeng Mi
2026-06-05  1:11 ` [PATCH 2/8] perf/x86: Introduce is_x86_pmu() helper Dapeng Mi
2026-06-05  1:11 ` [PATCH 3/8] perf/x86: Update cap_user_rdpmc base on rdpmc user disable state Dapeng Mi
2026-06-05  1:11 ` [PATCH 4/8] perf/x86/intel: Fix redundant branch type check in intel_pmu_lbr_filter() Dapeng Mi
2026-06-05  1:11 ` [PATCH 5/8] perf/x86/intel: Fix kernel address leakages in LBR stack Dapeng Mi
2026-06-05  1:33   ` sashiko-bot
2026-06-05  3:20     ` Mi, Dapeng
2026-06-05  1:11 ` [PATCH 6/8] perf/x86/intel: Validate return value of intel_pmu_init_hybrid() Dapeng Mi
2026-06-05  1:36   ` sashiko-bot
2026-06-05  3:29     ` Mi, Dapeng
2026-06-05  1:11 ` [PATCH 7/8] perf/x86/intel: Drop fixed-counter PEBS constraints for baseline PEBS Dapeng Mi
2026-06-05  1:11 ` [PATCH 8/8] perf/core: Fix kernel register info leak via hardware skid Dapeng Mi
2026-06-05  1:38   ` sashiko-bot
2026-06-05  3:42     ` Mi, Dapeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox