[Patch v2 0/6] x86 perf bug fixes and optimization

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [Patch v2 0/6] x86 perf bug fixes and optimization
@ 2025-08-11  9:00 Dapeng Mi
  2025-08-11  9:00 ` [Patch v2 1/6] perf/x86/intel: Use early_initcall() to hook bts_init() Dapeng Mi
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Dapeng Mi @ 2025-08-11  9:00 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Kan Liang, Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Dapeng Mi

This patch-set aggregates previously sent individual perf/x86 patches to
make the review more convenient. The patches 1-3/6 fix 3 x86 perf issues
and the last 3 patches enhance/optimize x86 perf code.

Changes:
  v1 -> v2:
  * Rebase to 6.17-rc1.
  * No code changes.

Tests:
  * Run perf stats/record commands on Intel Sapphire Rapids platform, no
    issue is found.

History:
  V1:
  * Patch 1/6: https://lore.kernel.org/all/20250606111606.84350-1-dapeng1.mi@linux.intel.com/
  * Patch 2/6: https://lore.kernel.org/all/20250529080236.2552247-1-dapeng1.mi@linux.intel.com/
  * Patch 3/6: https://lore.kernel.org/all/20250718062602.21444-1-dapeng1.mi@linux.intel.com/
  * Patches 4-6/6: https://lore.kernel.org/all/20250717090302.11316-1-dapeng1.mi@linux.intel.com/

Dapeng Mi (6):
  perf/x86/intel: Use early_initcall() to hook bts_init()
  perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
  perf/x86: Check if cpuc->events[*] pointer exists before accessing it
  perf/x86: Add PERF_CAP_PEBS_TIMING_INFO flag
  perf/x86/intel: Change macro GLOBAL_CTRL_EN_PERF_METRICS to
    BIT_ULL(48)
  perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into
    INTEL_FIXED_BITS_MASK

 arch/x86/events/core.c                 |  3 +++
 arch/x86/events/intel/bts.c            |  2 +-
 arch/x86/events/intel/core.c           | 27 +++++++++++++-------------
 arch/x86/events/intel/ds.c             | 13 ++++++-------
 arch/x86/include/asm/msr-index.h       | 14 +++++++------
 arch/x86/include/asm/perf_event.h      |  8 ++++++--
 arch/x86/kvm/pmu.h                     |  2 +-
 tools/arch/x86/include/asm/msr-index.h | 14 +++++++------
 8 files changed, 47 insertions(+), 36 deletions(-)


base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
-- 
2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Patch v2 1/6] perf/x86/intel: Use early_initcall() to hook bts_init()
  2025-08-11  9:00 [Patch v2 0/6] x86 perf bug fixes and optimization Dapeng Mi
@ 2025-08-11  9:00 ` Dapeng Mi
  2025-08-11  9:00 ` [Patch v2 2/6] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Dapeng Mi
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Dapeng Mi @ 2025-08-11  9:00 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Kan Liang, Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Dapeng Mi

After the commit 'd971342d38bf ("perf/x86/intel: Decouple BTS
 initialization from PEBS initialization")' is introduced, x86_pmu.bts
would initialized in bts_init() which is hooked by arch_initcall().

Whereas init_hw_perf_events() is hooked by early_initcall(). Once the
core PMU is initialized, nmi watchdog initialization is called
immediately before bts_init() is called. It leads to the BTS buffer is
not really initialized since bts_init() is not called and x86_pmu.bts is
still false at that time. Worse, BTS buffer would never be initialized
then unless all core PMU events are freed and reserve_ds_buffers()
is called again.

Thus aligning with init_hw_perf_events(), use early_initcall() to hook
bts_init() to ensure x86_pmu.bts is initialized before nmi watchdog
initialization.

Fixes: d971342d38bf ("perf/x86/intel: Decouple BTS initialization from PEBS initialization")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 arch/x86/events/intel/bts.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
index 61da6b8a3d51..cbac54cb3a9e 100644
--- a/arch/x86/events/intel/bts.c
+++ b/arch/x86/events/intel/bts.c
@@ -643,4 +643,4 @@ static __init int bts_init(void)

 	return perf_pmu_register(&bts_pmu, "intel_bts", -1);
 }
-arch_initcall(bts_init);
+early_initcall(bts_init);
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Patch v2 2/6] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
  2025-08-11  9:00 [Patch v2 0/6] x86 perf bug fixes and optimization Dapeng Mi
  2025-08-11  9:00 ` [Patch v2 1/6] perf/x86/intel: Use early_initcall() to hook bts_init() Dapeng Mi
@ 2025-08-11  9:00 ` Dapeng Mi
  2025-08-11  9:00 ` [Patch v2 3/6] perf/x86: Check if cpuc->events[*] pointer exists before accessing it Dapeng Mi
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Dapeng Mi @ 2025-08-11  9:00 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Kan Liang, Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Dapeng Mi

When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
 access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.

[   55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
[   55.611280] Call Trace:
[   55.611282]  <TASK>
[   55.611284]  ? intel_pmu_config_acr+0x87/0x160
[   55.611289]  intel_pmu_enable_acr+0x6d/0x80
[   55.611291]  intel_pmu_enable_event+0xce/0x460
[   55.611293]  x86_pmu_start+0x78/0xb0
[   55.611297]  x86_pmu_enable+0x218/0x3a0
[   55.611300]  ? x86_pmu_enable+0x121/0x3a0
[   55.611302]  perf_pmu_enable+0x40/0x50
[   55.611307]  ctx_resched+0x19d/0x220
[   55.611309]  __perf_install_in_context+0x284/0x2f0
[   55.611311]  ? __pfx_remote_function+0x10/0x10
[   55.611314]  remote_function+0x52/0x70
[   55.611317]  ? __pfx_remote_function+0x10/0x10
[   55.611319]  generic_exec_single+0x84/0x150
[   55.611323]  smp_call_function_single+0xc5/0x1a0
[   55.611326]  ? __pfx_remote_function+0x10/0x10
[   55.611329]  perf_install_in_context+0xd1/0x1e0
[   55.611331]  ? __pfx___perf_install_in_context+0x10/0x10
[   55.611333]  __do_sys_perf_event_open+0xa76/0x1040
[   55.611336]  __x64_sys_perf_event_open+0x26/0x30
[   55.611337]  x64_sys_call+0x1d8e/0x20c0
[   55.611339]  do_syscall_64+0x4f/0x120
[   55.611343]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR
which requires to enable auto counter reload on GP counter 0.

The root cause of causing this issue is the check for auto counter
reload (ACR) counter mask from user space is incorrect in
intel_pmu_acr_late_setup() helper. It leads to an invalid ACR counter
mask from user space could be set into hw.config1 and then written into
CFG_B MSRs and trigger the MSR access warning.

e.g., User may create a perf event with ACR counter mask (config2=0xcb),
and there is only 1 event created, so "cpuc->n_events" is 1.

The correct check condition should be "i + idx >= cpuc->n_events"
instead of "i + idx > cpuc->n_events" (it looks a typo). Otherwise,
the counter mask would traverse twice and an invalid "cpuc->assign[1]"
bit (bit 0) is set into hw.config1 and cause MSR accessing error.

Besides, also check if the ACR counter mask corresponding events are
ACR events. If not, filter out these counter mask. If a event is not a
ACR event, it could be scheduled to an HW counter which doesn't support
ACR. It's invalid to add their counter index in ACR counter mask.

Furthermore, remove the WARN_ON_ONCE() since it's easily triggered as
user could set any invalid ACR counter mask and the warning message
could mislead users.

Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index c2fb729c270e..15da60cf69f2 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2997,7 +2997,8 @@ static void intel_pmu_acr_late_setup(struct cpu_hw_events *cpuc)
 			if (event->group_leader != leader->group_leader)
 				break;
 			for_each_set_bit(idx, (unsigned long *)&event->attr.config2, X86_PMC_IDX_MAX) {
-				if (WARN_ON_ONCE(i + idx > cpuc->n_events))
+				if (i + idx >= cpuc->n_events ||
+				    !is_acr_event_group(cpuc->event_list[i + idx]))
 					return;
 				__set_bit(cpuc->assign[i + idx], (unsigned long *)&event->hw.config1);
 			}
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Patch v2 3/6] perf/x86: Check if cpuc->events[*] pointer exists before accessing it
  2025-08-11  9:00 [Patch v2 0/6] x86 perf bug fixes and optimization Dapeng Mi
  2025-08-11  9:00 ` [Patch v2 1/6] perf/x86/intel: Use early_initcall() to hook bts_init() Dapeng Mi
  2025-08-11  9:00 ` [Patch v2 2/6] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Dapeng Mi
@ 2025-08-11  9:00 ` Dapeng Mi
  2025-08-11 23:32   ` Liang, Kan
  2025-08-19  8:45   ` Peter Zijlstra
  2025-08-11  9:00 ` [Patch v2 4/6] perf/x86: Add PERF_CAP_PEBS_TIMING_INFO flag Dapeng Mi
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 16+ messages in thread
From: Dapeng Mi @ 2025-08-11  9:00 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Kan Liang, Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Dapeng Mi,
	kernel test robot

The PMI handler could disable some events as the interrupt throttling
and clear the corresponding items in cpuc->events[] array.

perf_event_overflow()
  -> __perf_event_overflow()
    ->__perf_event_account_interrupt()
      -> perf_event_throttle_group()
        -> perf_event_throttle()
          -> event->pmu->stop()
            -> x86_pmu_stop()

Moreover PMI is NMI on x86 platform and it could interrupt other perf
code like setup_pebs_adaptive_sample_data(). So once PMI handling
finishes and returns into setup_pebs_adaptive_sample_data() and it could
find the cpuc->events[*] becomes NULL and accessing this NULL pointer
triggers an invalid memory access and leads to kernel crashes eventually.

Thus add NULL check before accessing cpuc->events[*] pointer.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202507042103.a15d2923-lkp@intel.com
Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: kernel test robot <oliver.sang@intel.com>
---
 arch/x86/events/core.c       |  3 +++
 arch/x86/events/intel/core.c |  6 +++++-
 arch/x86/events/intel/ds.c   | 13 ++++++-------
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 7610f26dfbd9..f0a3bc57157d 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1711,6 +1711,9 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
 			continue;
 
 		event = cpuc->events[idx];
+		if (!event)
+			continue;
+
 		last_period = event->hw.last_period;
 
 		val = static_call(x86_pmu_update)(event);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 15da60cf69f2..386717b75a09 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2718,6 +2718,8 @@ static void update_saved_topdown_regs(struct perf_event *event, u64 slots,
 		if (!is_topdown_idx(idx))
 			continue;
 		other = cpuc->events[idx];
+		if (!other)
+			continue;
 		other->hw.saved_slots = slots;
 		other->hw.saved_metric = metrics;
 	}
@@ -2761,6 +2763,8 @@ static u64 intel_update_topdown_event(struct perf_event *event, int metric_end,
 		if (!is_topdown_idx(idx))
 			continue;
 		other = cpuc->events[idx];
+		if (!other)
+			continue;
 		__icl_update_topdown_event(other, slots, metrics,
 					   event ? event->hw.saved_slots : 0,
 					   event ? event->hw.saved_metric : 0);
@@ -3138,7 +3142,7 @@ static void x86_pmu_handle_guest_pebs(struct pt_regs *regs,
 
 	for_each_set_bit(bit, (unsigned long *)&guest_pebs_idxs, X86_PMC_IDX_MAX) {
 		event = cpuc->events[bit];
-		if (!event->attr.precise_ip)
+		if (!event || !event->attr.precise_ip)
 			continue;
 
 		perf_sample_data_init(data, 0, event->hw.last_period);
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index c0b7ac1c7594..b23c49e2e06f 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2480,6 +2480,8 @@ static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpuc, u64
 	 */
 	for_each_set_bit(bit, (unsigned long *)&pebs_enabled, X86_PMC_IDX_MAX) {
 		event = cpuc->events[bit];
+		if (!event)
+			continue;
 		if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
 			intel_pmu_save_and_restart_reload(event, 0);
 	}
@@ -2579,10 +2581,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
 			continue;
 
 		event = cpuc->events[bit];
-		if (WARN_ON_ONCE(!event))
-			continue;
-
-		if (WARN_ON_ONCE(!event->attr.precise_ip))
+		if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
 			continue;
 
 		/* log dropped samples number */
@@ -2645,9 +2644,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
 		pebs_status = basic->applicable_counters & cpuc->pebs_enabled & mask;
 		for_each_set_bit(bit, (unsigned long *)&pebs_status, X86_PMC_IDX_MAX) {
 			event = cpuc->events[bit];
-
-			if (WARN_ON_ONCE(!event) ||
-			    WARN_ON_ONCE(!event->attr.precise_ip))
+			if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
 				continue;
 
 			if (counts[bit]++) {
@@ -2663,6 +2660,8 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
 			continue;
 
 		event = cpuc->events[bit];
+		if (!event)
+			continue;
 
 		__intel_pmu_pebs_last_event(event, iregs, regs, data, last[bit],
 					    counts[bit], setup_pebs_adaptive_sample_data);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Patch v2 4/6] perf/x86: Add PERF_CAP_PEBS_TIMING_INFO flag
  2025-08-11  9:00 [Patch v2 0/6] x86 perf bug fixes and optimization Dapeng Mi
                   ` (2 preceding siblings ...)
  2025-08-11  9:00 ` [Patch v2 3/6] perf/x86: Check if cpuc->events[*] pointer exists before accessing it Dapeng Mi
@ 2025-08-11  9:00 ` Dapeng Mi
  2025-08-11  9:00 ` [Patch v2 5/6] perf/x86/intel: Change macro GLOBAL_CTRL_EN_PERF_METRICS to BIT_ULL(48) Dapeng Mi
  2025-08-11  9:00 ` [Patch v2 6/6] perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASK Dapeng Mi
  5 siblings, 0 replies; 16+ messages in thread
From: Dapeng Mi @ 2025-08-11  9:00 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Kan Liang, Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Dapeng Mi, Yi Lai

IA32_PERF_CAPABILITIES.PEBS_TIMING_INFO[bit 17] is introduced to
indicate whether timed PEBS is supported. Timed PEBS adds a new "retired
latency" field in basic info group to show the timing info. Please find
detailed information about timed PEBS in section 8.4.1 "Timed Processor
Event Based Sampling" of "Intel Architecture Instruction Set Extensions
and Future Features".

This patch adds PERF_CAP_PEBS_TIMING_INFO flag and KVM module leverages
this flag to expose timed PEBS feature to guest.

Moreover, opportunistically refine the indents and make the macros
share consistent indents.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
---
 arch/x86/include/asm/msr-index.h       | 14 ++++++++------
 tools/arch/x86/include/asm/msr-index.h | 14 ++++++++------
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b65c3ba5fa14..f627196eb796 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -315,12 +315,14 @@
 #define PERF_CAP_PT_IDX			16
 
 #define MSR_PEBS_LD_LAT_THRESHOLD	0x000003f6
-#define PERF_CAP_PEBS_TRAP             BIT_ULL(6)
-#define PERF_CAP_ARCH_REG              BIT_ULL(7)
-#define PERF_CAP_PEBS_FORMAT           0xf00
-#define PERF_CAP_PEBS_BASELINE         BIT_ULL(14)
-#define PERF_CAP_PEBS_MASK	(PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
-				 PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE)
+#define PERF_CAP_PEBS_TRAP		BIT_ULL(6)
+#define PERF_CAP_ARCH_REG		BIT_ULL(7)
+#define PERF_CAP_PEBS_FORMAT		0xf00
+#define PERF_CAP_PEBS_BASELINE		BIT_ULL(14)
+#define PERF_CAP_PEBS_TIMING_INFO	BIT_ULL(17)
+#define PERF_CAP_PEBS_MASK		(PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
+					 PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE | \
+					 PERF_CAP_PEBS_TIMING_INFO)
 
 #define MSR_IA32_RTIT_CTL		0x00000570
 #define RTIT_CTL_TRACEEN		BIT(0)
diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h
index 5cfb5d74dd5f..daebfd926f08 100644
--- a/tools/arch/x86/include/asm/msr-index.h
+++ b/tools/arch/x86/include/asm/msr-index.h
@@ -315,12 +315,14 @@
 #define PERF_CAP_PT_IDX			16
 
 #define MSR_PEBS_LD_LAT_THRESHOLD	0x000003f6
-#define PERF_CAP_PEBS_TRAP             BIT_ULL(6)
-#define PERF_CAP_ARCH_REG              BIT_ULL(7)
-#define PERF_CAP_PEBS_FORMAT           0xf00
-#define PERF_CAP_PEBS_BASELINE         BIT_ULL(14)
-#define PERF_CAP_PEBS_MASK	(PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
-				 PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE)
+#define PERF_CAP_PEBS_TRAP		BIT_ULL(6)
+#define PERF_CAP_ARCH_REG		BIT_ULL(7)
+#define PERF_CAP_PEBS_FORMAT		0xf00
+#define PERF_CAP_PEBS_BASELINE		BIT_ULL(14)
+#define PERF_CAP_PEBS_TIMING_INFO	BIT_ULL(17)
+#define PERF_CAP_PEBS_MASK		(PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
+					 PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE | \
+					 PERF_CAP_PEBS_TIMING_INFO)
 
 #define MSR_IA32_RTIT_CTL		0x00000570
 #define RTIT_CTL_TRACEEN		BIT(0)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Patch v2 5/6] perf/x86/intel: Change macro GLOBAL_CTRL_EN_PERF_METRICS to BIT_ULL(48)
  2025-08-11  9:00 [Patch v2 0/6] x86 perf bug fixes and optimization Dapeng Mi
                   ` (3 preceding siblings ...)
  2025-08-11  9:00 ` [Patch v2 4/6] perf/x86: Add PERF_CAP_PEBS_TIMING_INFO flag Dapeng Mi
@ 2025-08-11  9:00 ` Dapeng Mi
  2025-08-11  9:00 ` [Patch v2 6/6] perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASK Dapeng Mi
  5 siblings, 0 replies; 16+ messages in thread
From: Dapeng Mi @ 2025-08-11  9:00 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Kan Liang, Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Dapeng Mi, Yi Lai

Macro GLOBAL_CTRL_EN_PERF_METRICS is defined to 48 instead of
BIT_ULL(48), it's inconsistent with other similar macros. This leads to
this macro is quite easily used wrongly since users thinks it's a
bit-mask just like other similar macros.

Thus change GLOBAL_CTRL_EN_PERF_METRICS to BIT_ULL(48) and eliminate
this potential misuse.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
---
 arch/x86/events/intel/core.c      | 8 ++++----
 arch/x86/include/asm/perf_event.h | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 386717b75a09..cdd10370ed95 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5323,9 +5323,9 @@ static void intel_pmu_check_hybrid_pmus(struct x86_hybrid_pmu *pmu)
 						0, x86_pmu_num_counters(&pmu->pmu), 0, 0);
 
 	if (pmu->intel_cap.perf_metrics)
-		pmu->intel_ctrl |= 1ULL << GLOBAL_CTRL_EN_PERF_METRICS;
+		pmu->intel_ctrl |= GLOBAL_CTRL_EN_PERF_METRICS;
 	else
-		pmu->intel_ctrl &= ~(1ULL << GLOBAL_CTRL_EN_PERF_METRICS);
+		pmu->intel_ctrl &= ~GLOBAL_CTRL_EN_PERF_METRICS;
 
 	intel_pmu_check_event_constraints(pmu->event_constraints,
 					  pmu->cntr_mask64,
@@ -5460,7 +5460,7 @@ static void intel_pmu_cpu_starting(int cpu)
 		rdmsrq(MSR_IA32_PERF_CAPABILITIES, perf_cap.capabilities);
 		if (!perf_cap.perf_metrics) {
 			x86_pmu.intel_cap.perf_metrics = 0;
-			x86_pmu.intel_ctrl &= ~(1ULL << GLOBAL_CTRL_EN_PERF_METRICS);
+			x86_pmu.intel_ctrl &= ~GLOBAL_CTRL_EN_PERF_METRICS;
 		}
 	}
 
@@ -7794,7 +7794,7 @@ __init int intel_pmu_init(void)
 	}
 
 	if (!is_hybrid() && x86_pmu.intel_cap.perf_metrics)
-		x86_pmu.intel_ctrl |= 1ULL << GLOBAL_CTRL_EN_PERF_METRICS;
+		x86_pmu.intel_ctrl |= GLOBAL_CTRL_EN_PERF_METRICS;
 
 	if (x86_pmu.intel_cap.pebs_timing_info)
 		x86_pmu.flags |= PMU_FL_RETIRE_LATENCY;
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 70d1d94aca7e..f8247ac276c4 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -430,7 +430,7 @@ static inline bool is_topdown_idx(int idx)
 #define GLOBAL_STATUS_TRACE_TOPAPMI		BIT_ULL(GLOBAL_STATUS_TRACE_TOPAPMI_BIT)
 #define GLOBAL_STATUS_PERF_METRICS_OVF_BIT	48
 
-#define GLOBAL_CTRL_EN_PERF_METRICS		48
+#define GLOBAL_CTRL_EN_PERF_METRICS		BIT_ULL(48)
 /*
  * We model guest LBR event tracing as another fixed-mode PMC like BTS.
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Patch v2 6/6] perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASK
  2025-08-11  9:00 [Patch v2 0/6] x86 perf bug fixes and optimization Dapeng Mi
                   ` (4 preceding siblings ...)
  2025-08-11  9:00 ` [Patch v2 5/6] perf/x86/intel: Change macro GLOBAL_CTRL_EN_PERF_METRICS to BIT_ULL(48) Dapeng Mi
@ 2025-08-11  9:00 ` Dapeng Mi
  2025-08-12  0:00   ` Liang, Kan
  5 siblings, 1 reply; 16+ messages in thread
From: Dapeng Mi @ 2025-08-11  9:00 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Kan Liang, Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Dapeng Mi, Yi Lai

ICL_FIXED_0_ADAPTIVE is missed to be added into INTEL_FIXED_BITS_MASK,
add it and opportunistically refine fixed counter enabling code.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
---
 arch/x86/events/intel/core.c      | 10 +++-------
 arch/x86/include/asm/perf_event.h |  6 +++++-
 arch/x86/kvm/pmu.h                |  2 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index cdd10370ed95..1a91b527d3c5 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2849,8 +2849,8 @@ static void intel_pmu_enable_fixed(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
-	u64 mask, bits = 0;
 	int idx = hwc->idx;
+	u64 bits = 0;
 
 	if (is_topdown_idx(idx)) {
 		struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
@@ -2889,14 +2889,10 @@ static void intel_pmu_enable_fixed(struct perf_event *event)
 
 	idx -= INTEL_PMC_IDX_FIXED;
 	bits = intel_fixed_bits_by_idx(idx, bits);
-	mask = intel_fixed_bits_by_idx(idx, INTEL_FIXED_BITS_MASK);
-
-	if (x86_pmu.intel_cap.pebs_baseline && event->attr.precise_ip) {
+	if (x86_pmu.intel_cap.pebs_baseline && event->attr.precise_ip)
 		bits |= intel_fixed_bits_by_idx(idx, ICL_FIXED_0_ADAPTIVE);
-		mask |= intel_fixed_bits_by_idx(idx, ICL_FIXED_0_ADAPTIVE);
-	}
 
-	cpuc->fixed_ctrl_val &= ~mask;
+	cpuc->fixed_ctrl_val &= ~intel_fixed_bits_by_idx(idx, INTEL_FIXED_BITS_MASK);
 	cpuc->fixed_ctrl_val |= bits;
 }
 
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index f8247ac276c4..49a4d442f3fc 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -35,7 +35,6 @@
 #define ARCH_PERFMON_EVENTSEL_EQ			(1ULL << 36)
 #define ARCH_PERFMON_EVENTSEL_UMASK2			(0xFFULL << 40)
 
-#define INTEL_FIXED_BITS_MASK				0xFULL
 #define INTEL_FIXED_BITS_STRIDE			4
 #define INTEL_FIXED_0_KERNEL				(1ULL << 0)
 #define INTEL_FIXED_0_USER				(1ULL << 1)
@@ -48,6 +47,11 @@
 #define ICL_EVENTSEL_ADAPTIVE				(1ULL << 34)
 #define ICL_FIXED_0_ADAPTIVE				(1ULL << 32)
 
+#define INTEL_FIXED_BITS_MASK					\
+	(INTEL_FIXED_0_KERNEL | INTEL_FIXED_0_USER |		\
+	 INTEL_FIXED_0_ANYTHREAD | INTEL_FIXED_0_ENABLE_PMI |	\
+	 ICL_FIXED_0_ADAPTIVE)
+
 #define intel_fixed_bits_by_idx(_idx, _bits)			\
 	((_bits) << ((_idx) * INTEL_FIXED_BITS_STRIDE))
 
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index ad89d0bd6005..103604c4b33b 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -13,7 +13,7 @@
 #define MSR_IA32_MISC_ENABLE_PMU_RO_MASK (MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |	\
 					  MSR_IA32_MISC_ENABLE_BTS_UNAVAIL)
 
-/* retrieve the 4 bits for EN and PMI out of IA32_FIXED_CTR_CTRL */
+/* retrieve a fixed counter bits out of IA32_FIXED_CTR_CTRL */
 #define fixed_ctrl_field(ctrl_reg, idx) \
 	(((ctrl_reg) >> ((idx) * INTEL_FIXED_BITS_STRIDE)) & INTEL_FIXED_BITS_MASK)
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Patch v2 3/6] perf/x86: Check if cpuc->events[*] pointer exists before accessing it
  2025-08-11  9:00 ` [Patch v2 3/6] perf/x86: Check if cpuc->events[*] pointer exists before accessing it Dapeng Mi
@ 2025-08-11 23:32   ` Liang, Kan
  2025-08-12  2:33     ` Mi, Dapeng
  2025-08-19  8:45   ` Peter Zijlstra
  1 sibling, 1 reply; 16+ messages in thread
From: Liang, Kan @ 2025-08-11 23:32 UTC (permalink / raw)
  To: Dapeng Mi, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, kernel test robot



On 2025-08-11 2:00 a.m., Dapeng Mi wrote:
> The PMI handler could disable some events as the interrupt throttling
> and clear the corresponding items in cpuc->events[] array.
> 
> perf_event_overflow()
>   -> __perf_event_overflow()
>     ->__perf_event_account_interrupt()
>       -> perf_event_throttle_group()
>         -> perf_event_throttle()
>           -> event->pmu->stop()
>             -> x86_pmu_stop()
> 
> Moreover PMI is NMI on x86 platform and it could interrupt other perf
> code like setup_pebs_adaptive_sample_data(). 

The PMU is disabled when draining the PEBS records. I don't think a PMI
can be triggered in the setup_pebs_adaptive_sample_data().

> So once PMI handling
> finishes and returns into setup_pebs_adaptive_sample_data() and it could
> find the cpuc->events[*] becomes NULL and accessing this NULL pointer
> triggers an invalid memory access and leads to kernel crashes eventually.

The commit 9734e25fbf5a stops all events in a group when processing the
last records of the leader event. For large PEBS, it's possible that
there are still some records for member events left. It should be the
root cause of the NULL pointer. If so, we should drain those records as
well.

Thanks,
Kan>
> Thus add NULL check before accessing cpuc->events[*] pointer.
> 
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202507042103.a15d2923-lkp@intel.com
> Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group")
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Tested-by: kernel test robot <oliver.sang@intel.com>
> ---
>  arch/x86/events/core.c       |  3 +++
>  arch/x86/events/intel/core.c |  6 +++++-
>  arch/x86/events/intel/ds.c   | 13 ++++++-------
>  3 files changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 7610f26dfbd9..f0a3bc57157d 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1711,6 +1711,9 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
>  			continue;
>  
>  		event = cpuc->events[idx];
> +		if (!event)
> +			continue;
> +
>  		last_period = event->hw.last_period;
>  
>  		val = static_call(x86_pmu_update)(event);
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 15da60cf69f2..386717b75a09 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2718,6 +2718,8 @@ static void update_saved_topdown_regs(struct perf_event *event, u64 slots,
>  		if (!is_topdown_idx(idx))
>  			continue;
>  		other = cpuc->events[idx];
> +		if (!other)
> +			continue;
>  		other->hw.saved_slots = slots;
>  		other->hw.saved_metric = metrics;
>  	}
> @@ -2761,6 +2763,8 @@ static u64 intel_update_topdown_event(struct perf_event *event, int metric_end,
>  		if (!is_topdown_idx(idx))
>  			continue;
>  		other = cpuc->events[idx];
> +		if (!other)
> +			continue;
>  		__icl_update_topdown_event(other, slots, metrics,
>  					   event ? event->hw.saved_slots : 0,
>  					   event ? event->hw.saved_metric : 0);
> @@ -3138,7 +3142,7 @@ static void x86_pmu_handle_guest_pebs(struct pt_regs *regs,
>  
>  	for_each_set_bit(bit, (unsigned long *)&guest_pebs_idxs, X86_PMC_IDX_MAX) {
>  		event = cpuc->events[bit];
> -		if (!event->attr.precise_ip)
> +		if (!event || !event->attr.precise_ip)
>  			continue;
>  
>  		perf_sample_data_init(data, 0, event->hw.last_period);
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index c0b7ac1c7594..b23c49e2e06f 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -2480,6 +2480,8 @@ static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpuc, u64
>  	 */
>  	for_each_set_bit(bit, (unsigned long *)&pebs_enabled, X86_PMC_IDX_MAX) {
>  		event = cpuc->events[bit];
> +		if (!event)
> +			continue;
>  		if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
>  			intel_pmu_save_and_restart_reload(event, 0);
>  	}
> @@ -2579,10 +2581,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
>  			continue;
>  
>  		event = cpuc->events[bit];
> -		if (WARN_ON_ONCE(!event))
> -			continue;
> -
> -		if (WARN_ON_ONCE(!event->attr.precise_ip))
> +		if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
>  			continue;
>  
>  		/* log dropped samples number */
> @@ -2645,9 +2644,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
>  		pebs_status = basic->applicable_counters & cpuc->pebs_enabled & mask;
>  		for_each_set_bit(bit, (unsigned long *)&pebs_status, X86_PMC_IDX_MAX) {
>  			event = cpuc->events[bit];
> -
> -			if (WARN_ON_ONCE(!event) ||
> -			    WARN_ON_ONCE(!event->attr.precise_ip))
> +			if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
>  				continue;
>  
>  			if (counts[bit]++) {
> @@ -2663,6 +2660,8 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
>  			continue;
>  
>  		event = cpuc->events[bit];
> +		if (!event)
> +			continue;
>  
>  		__intel_pmu_pebs_last_event(event, iregs, regs, data, last[bit],
>  					    counts[bit], setup_pebs_adaptive_sample_data);


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 6/6] perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASK
  2025-08-11  9:00 ` [Patch v2 6/6] perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASK Dapeng Mi
@ 2025-08-12  0:00   ` Liang, Kan
  2025-08-12  2:54     ` Mi, Dapeng
  0 siblings, 1 reply; 16+ messages in thread
From: Liang, Kan @ 2025-08-12  0:00 UTC (permalink / raw)
  To: Dapeng Mi, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Yi Lai



On 2025-08-11 2:00 a.m., Dapeng Mi wrote:
> ICL_FIXED_0_ADAPTIVE is missed to be added into INTEL_FIXED_BITS_MASK,
> add it and opportunistically refine fixed counter enabling code.
> 
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Tested-by: Yi Lai <yi1.lai@intel.com>
> ---
>  arch/x86/events/intel/core.c      | 10 +++-------
>  arch/x86/include/asm/perf_event.h |  6 +++++-
>  arch/x86/kvm/pmu.h                |  2 +-
>  3 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index cdd10370ed95..1a91b527d3c5 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2849,8 +2849,8 @@ static void intel_pmu_enable_fixed(struct perf_event *event)
>  {
>  	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>  	struct hw_perf_event *hwc = &event->hw;
> -	u64 mask, bits = 0;
>  	int idx = hwc->idx;
> +	u64 bits = 0;
>  
>  	if (is_topdown_idx(idx)) {
>  		struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> @@ -2889,14 +2889,10 @@ static void intel_pmu_enable_fixed(struct perf_event *event)
>  
>  	idx -= INTEL_PMC_IDX_FIXED;
>  	bits = intel_fixed_bits_by_idx(idx, bits);
> -	mask = intel_fixed_bits_by_idx(idx, INTEL_FIXED_BITS_MASK);
> -
> -	if (x86_pmu.intel_cap.pebs_baseline && event->attr.precise_ip) {
> +	if (x86_pmu.intel_cap.pebs_baseline && event->attr.precise_ip)
>  		bits |= intel_fixed_bits_by_idx(idx, ICL_FIXED_0_ADAPTIVE);
> -		mask |= intel_fixed_bits_by_idx(idx, ICL_FIXED_0_ADAPTIVE);
> -	}
>  

This changes the behavior. The mask will always include the ADAPTIVE bit
even on a platform which doesn't support adaptive pebs.
The description doesn't mention why it's OK.

Thanks,
Kan> -	cpuc->fixed_ctrl_val &= ~mask;
> +	cpuc->fixed_ctrl_val &= ~intel_fixed_bits_by_idx(idx, INTEL_FIXED_BITS_MASK);
>  	cpuc->fixed_ctrl_val |= bits;
>  }
>  
> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> index f8247ac276c4..49a4d442f3fc 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -35,7 +35,6 @@
>  #define ARCH_PERFMON_EVENTSEL_EQ			(1ULL << 36)
>  #define ARCH_PERFMON_EVENTSEL_UMASK2			(0xFFULL << 40)
>  
> -#define INTEL_FIXED_BITS_MASK				0xFULL
>  #define INTEL_FIXED_BITS_STRIDE			4
>  #define INTEL_FIXED_0_KERNEL				(1ULL << 0)
>  #define INTEL_FIXED_0_USER				(1ULL << 1)
> @@ -48,6 +47,11 @@
>  #define ICL_EVENTSEL_ADAPTIVE				(1ULL << 34)
>  #define ICL_FIXED_0_ADAPTIVE				(1ULL << 32)
>  
> +#define INTEL_FIXED_BITS_MASK					\
> +	(INTEL_FIXED_0_KERNEL | INTEL_FIXED_0_USER |		\
> +	 INTEL_FIXED_0_ANYTHREAD | INTEL_FIXED_0_ENABLE_PMI |	\
> +	 ICL_FIXED_0_ADAPTIVE)
> +
>  #define intel_fixed_bits_by_idx(_idx, _bits)			\
>  	((_bits) << ((_idx) * INTEL_FIXED_BITS_STRIDE))
>  
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index ad89d0bd6005..103604c4b33b 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -13,7 +13,7 @@
>  #define MSR_IA32_MISC_ENABLE_PMU_RO_MASK (MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |	\
>  					  MSR_IA32_MISC_ENABLE_BTS_UNAVAIL)
>  
> -/* retrieve the 4 bits for EN and PMI out of IA32_FIXED_CTR_CTRL */
> +/* retrieve a fixed counter bits out of IA32_FIXED_CTR_CTRL */
>  #define fixed_ctrl_field(ctrl_reg, idx) \
>  	(((ctrl_reg) >> ((idx) * INTEL_FIXED_BITS_STRIDE)) & INTEL_FIXED_BITS_MASK)
>  


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 3/6] perf/x86: Check if cpuc->events[*] pointer exists before accessing it
  2025-08-11 23:32   ` Liang, Kan
@ 2025-08-12  2:33     ` Mi, Dapeng
  2025-08-12 18:16       ` Liang, Kan
  0 siblings, 1 reply; 16+ messages in thread
From: Mi, Dapeng @ 2025-08-12  2:33 UTC (permalink / raw)
  To: Liang, Kan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, kernel test robot


On 8/12/2025 7:32 AM, Liang, Kan wrote:
>
> On 2025-08-11 2:00 a.m., Dapeng Mi wrote:
>> The PMI handler could disable some events as the interrupt throttling
>> and clear the corresponding items in cpuc->events[] array.
>>
>> perf_event_overflow()
>>   -> __perf_event_overflow()
>>     ->__perf_event_account_interrupt()
>>       -> perf_event_throttle_group()
>>         -> perf_event_throttle()
>>           -> event->pmu->stop()
>>             -> x86_pmu_stop()
>>
>> Moreover PMI is NMI on x86 platform and it could interrupt other perf
>> code like setup_pebs_adaptive_sample_data(). 
> The PMU is disabled when draining the PEBS records. I don't think a PMI
> can be triggered in the setup_pebs_adaptive_sample_data().

Besides in NMI handler, the drain_pebs helper intel_pmu_drain_pebs_buffer()
could be called in many places, like context switch and PEBS event
disabling. All these places could be interrupted by the NMI handler, and
then the trigger this NULL pointer access issue.


>
>> So once PMI handling
>> finishes and returns into setup_pebs_adaptive_sample_data() and it could
>> find the cpuc->events[*] becomes NULL and accessing this NULL pointer
>> triggers an invalid memory access and leads to kernel crashes eventually.
> The commit 9734e25fbf5a stops all events in a group when processing the
> last records of the leader event. For large PEBS, it's possible that
> there are still some records for member events left. It should be the
> root cause of the NULL pointer. If so, we should drain those records as
> well.

The left PEBS record would always be cleared by
intel_pmu_drain_large_pebs() when disabling PEBS event.


>
> Thanks,
> Kan>
>> Thus add NULL check before accessing cpuc->events[*] pointer.
>>
>> Reported-by: kernel test robot <oliver.sang@intel.com>
>> Closes: https://lore.kernel.org/oe-lkp/202507042103.a15d2923-lkp@intel.com
>> Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group")
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> Tested-by: kernel test robot <oliver.sang@intel.com>
>> ---
>>  arch/x86/events/core.c       |  3 +++
>>  arch/x86/events/intel/core.c |  6 +++++-
>>  arch/x86/events/intel/ds.c   | 13 ++++++-------
>>  3 files changed, 14 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index 7610f26dfbd9..f0a3bc57157d 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -1711,6 +1711,9 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
>>  			continue;
>>  
>>  		event = cpuc->events[idx];
>> +		if (!event)
>> +			continue;
>> +
>>  		last_period = event->hw.last_period;
>>  
>>  		val = static_call(x86_pmu_update)(event);
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 15da60cf69f2..386717b75a09 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -2718,6 +2718,8 @@ static void update_saved_topdown_regs(struct perf_event *event, u64 slots,
>>  		if (!is_topdown_idx(idx))
>>  			continue;
>>  		other = cpuc->events[idx];
>> +		if (!other)
>> +			continue;
>>  		other->hw.saved_slots = slots;
>>  		other->hw.saved_metric = metrics;
>>  	}
>> @@ -2761,6 +2763,8 @@ static u64 intel_update_topdown_event(struct perf_event *event, int metric_end,
>>  		if (!is_topdown_idx(idx))
>>  			continue;
>>  		other = cpuc->events[idx];
>> +		if (!other)
>> +			continue;
>>  		__icl_update_topdown_event(other, slots, metrics,
>>  					   event ? event->hw.saved_slots : 0,
>>  					   event ? event->hw.saved_metric : 0);
>> @@ -3138,7 +3142,7 @@ static void x86_pmu_handle_guest_pebs(struct pt_regs *regs,
>>  
>>  	for_each_set_bit(bit, (unsigned long *)&guest_pebs_idxs, X86_PMC_IDX_MAX) {
>>  		event = cpuc->events[bit];
>> -		if (!event->attr.precise_ip)
>> +		if (!event || !event->attr.precise_ip)
>>  			continue;
>>  
>>  		perf_sample_data_init(data, 0, event->hw.last_period);
>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>> index c0b7ac1c7594..b23c49e2e06f 100644
>> --- a/arch/x86/events/intel/ds.c
>> +++ b/arch/x86/events/intel/ds.c
>> @@ -2480,6 +2480,8 @@ static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpuc, u64
>>  	 */
>>  	for_each_set_bit(bit, (unsigned long *)&pebs_enabled, X86_PMC_IDX_MAX) {
>>  		event = cpuc->events[bit];
>> +		if (!event)
>> +			continue;
>>  		if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
>>  			intel_pmu_save_and_restart_reload(event, 0);
>>  	}
>> @@ -2579,10 +2581,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
>>  			continue;
>>  
>>  		event = cpuc->events[bit];
>> -		if (WARN_ON_ONCE(!event))
>> -			continue;
>> -
>> -		if (WARN_ON_ONCE(!event->attr.precise_ip))
>> +		if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
>>  			continue;
>>  
>>  		/* log dropped samples number */
>> @@ -2645,9 +2644,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
>>  		pebs_status = basic->applicable_counters & cpuc->pebs_enabled & mask;
>>  		for_each_set_bit(bit, (unsigned long *)&pebs_status, X86_PMC_IDX_MAX) {
>>  			event = cpuc->events[bit];
>> -
>> -			if (WARN_ON_ONCE(!event) ||
>> -			    WARN_ON_ONCE(!event->attr.precise_ip))
>> +			if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
>>  				continue;
>>  
>>  			if (counts[bit]++) {
>> @@ -2663,6 +2660,8 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
>>  			continue;
>>  
>>  		event = cpuc->events[bit];
>> +		if (!event)
>> +			continue;
>>  
>>  		__intel_pmu_pebs_last_event(event, iregs, regs, data, last[bit],
>>  					    counts[bit], setup_pebs_adaptive_sample_data);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 6/6] perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASK
  2025-08-12  0:00   ` Liang, Kan
@ 2025-08-12  2:54     ` Mi, Dapeng
  0 siblings, 0 replies; 16+ messages in thread
From: Mi, Dapeng @ 2025-08-12  2:54 UTC (permalink / raw)
  To: Liang, Kan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Yi Lai


On 8/12/2025 8:00 AM, Liang, Kan wrote:
>
> On 2025-08-11 2:00 a.m., Dapeng Mi wrote:
>> ICL_FIXED_0_ADAPTIVE is missed to be added into INTEL_FIXED_BITS_MASK,
>> add it and opportunistically refine fixed counter enabling code.
>>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> Tested-by: Yi Lai <yi1.lai@intel.com>
>> ---
>>  arch/x86/events/intel/core.c      | 10 +++-------
>>  arch/x86/include/asm/perf_event.h |  6 +++++-
>>  arch/x86/kvm/pmu.h                |  2 +-
>>  3 files changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index cdd10370ed95..1a91b527d3c5 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -2849,8 +2849,8 @@ static void intel_pmu_enable_fixed(struct perf_event *event)
>>  {
>>  	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>>  	struct hw_perf_event *hwc = &event->hw;
>> -	u64 mask, bits = 0;
>>  	int idx = hwc->idx;
>> +	u64 bits = 0;
>>  
>>  	if (is_topdown_idx(idx)) {
>>  		struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> @@ -2889,14 +2889,10 @@ static void intel_pmu_enable_fixed(struct perf_event *event)
>>  
>>  	idx -= INTEL_PMC_IDX_FIXED;
>>  	bits = intel_fixed_bits_by_idx(idx, bits);
>> -	mask = intel_fixed_bits_by_idx(idx, INTEL_FIXED_BITS_MASK);
>> -
>> -	if (x86_pmu.intel_cap.pebs_baseline && event->attr.precise_ip) {
>> +	if (x86_pmu.intel_cap.pebs_baseline && event->attr.precise_ip)
>>  		bits |= intel_fixed_bits_by_idx(idx, ICL_FIXED_0_ADAPTIVE);
>> -		mask |= intel_fixed_bits_by_idx(idx, ICL_FIXED_0_ADAPTIVE);
>> -	}
>>  
> This changes the behavior. The mask will always include the ADAPTIVE bit
> even on a platform which doesn't support adaptive pebs.
> The description doesn't mention why it's OK.

The mask is used to clear the old fixed bits, the ICL_FIXED_0_ADAPTIVE bit
needs to be cleared regardless of if the platform really supports it. I
would enhance the commit message to describe this.


>
> Thanks,
> Kan> -	cpuc->fixed_ctrl_val &= ~mask;
>> +	cpuc->fixed_ctrl_val &= ~intel_fixed_bits_by_idx(idx, INTEL_FIXED_BITS_MASK);
>>  	cpuc->fixed_ctrl_val |= bits;
>>  }
>>  
>> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
>> index f8247ac276c4..49a4d442f3fc 100644
>> --- a/arch/x86/include/asm/perf_event.h
>> +++ b/arch/x86/include/asm/perf_event.h
>> @@ -35,7 +35,6 @@
>>  #define ARCH_PERFMON_EVENTSEL_EQ			(1ULL << 36)
>>  #define ARCH_PERFMON_EVENTSEL_UMASK2			(0xFFULL << 40)
>>  
>> -#define INTEL_FIXED_BITS_MASK				0xFULL
>>  #define INTEL_FIXED_BITS_STRIDE			4
>>  #define INTEL_FIXED_0_KERNEL				(1ULL << 0)
>>  #define INTEL_FIXED_0_USER				(1ULL << 1)
>> @@ -48,6 +47,11 @@
>>  #define ICL_EVENTSEL_ADAPTIVE				(1ULL << 34)
>>  #define ICL_FIXED_0_ADAPTIVE				(1ULL << 32)
>>  
>> +#define INTEL_FIXED_BITS_MASK					\
>> +	(INTEL_FIXED_0_KERNEL | INTEL_FIXED_0_USER |		\
>> +	 INTEL_FIXED_0_ANYTHREAD | INTEL_FIXED_0_ENABLE_PMI |	\
>> +	 ICL_FIXED_0_ADAPTIVE)
>> +
>>  #define intel_fixed_bits_by_idx(_idx, _bits)			\
>>  	((_bits) << ((_idx) * INTEL_FIXED_BITS_STRIDE))
>>  
>> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
>> index ad89d0bd6005..103604c4b33b 100644
>> --- a/arch/x86/kvm/pmu.h
>> +++ b/arch/x86/kvm/pmu.h
>> @@ -13,7 +13,7 @@
>>  #define MSR_IA32_MISC_ENABLE_PMU_RO_MASK (MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |	\
>>  					  MSR_IA32_MISC_ENABLE_BTS_UNAVAIL)
>>  
>> -/* retrieve the 4 bits for EN and PMI out of IA32_FIXED_CTR_CTRL */
>> +/* retrieve a fixed counter bits out of IA32_FIXED_CTR_CTRL */
>>  #define fixed_ctrl_field(ctrl_reg, idx) \
>>  	(((ctrl_reg) >> ((idx) * INTEL_FIXED_BITS_STRIDE)) & INTEL_FIXED_BITS_MASK)
>>  
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 3/6] perf/x86: Check if cpuc->events[*] pointer exists before accessing it
  2025-08-12  2:33     ` Mi, Dapeng
@ 2025-08-12 18:16       ` Liang, Kan
  2025-08-19  8:02         ` Mi, Dapeng
  0 siblings, 1 reply; 16+ messages in thread
From: Liang, Kan @ 2025-08-12 18:16 UTC (permalink / raw)
  To: Mi, Dapeng, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, kernel test robot



On 2025-08-11 7:33 p.m., Mi, Dapeng wrote:
> 
> On 8/12/2025 7:32 AM, Liang, Kan wrote:
>>
>> On 2025-08-11 2:00 a.m., Dapeng Mi wrote:
>>> The PMI handler could disable some events as the interrupt throttling
>>> and clear the corresponding items in cpuc->events[] array.
>>>
>>> perf_event_overflow()
>>>   -> __perf_event_overflow()
>>>     ->__perf_event_account_interrupt()
>>>       -> perf_event_throttle_group()
>>>         -> perf_event_throttle()
>>>           -> event->pmu->stop()
>>>             -> x86_pmu_stop()
>>>
>>> Moreover PMI is NMI on x86 platform and it could interrupt other perf
>>> code like setup_pebs_adaptive_sample_data(). 
>> The PMU is disabled when draining the PEBS records. I don't think a PMI
>> can be triggered in the setup_pebs_adaptive_sample_data().
> 
> Besides in NMI handler, the drain_pebs helper intel_pmu_drain_pebs_buffer()
> could be called in many places, like context switch and PEBS event
> disabling. 

Yes

> All these places could be interrupted by the NMI handler, and

No. Before draining the buffer, the PMU must be stopped. No NMI could be
triggered.

> then the trigger this NULL pointer access issue.
> 
> 
>>
>>> So once PMI handling
>>> finishes and returns into setup_pebs_adaptive_sample_data() and it could
>>> find the cpuc->events[*] becomes NULL and accessing this NULL pointer
>>> triggers an invalid memory access and leads to kernel crashes eventually.
>> The commit 9734e25fbf5a stops all events in a group when processing the
>> last records of the leader event. For large PEBS, it's possible that
>> there are still some records for member events left. It should be the
>> root cause of the NULL pointer. If so, we should drain those records as
>> well.
> 
> The left PEBS record would always be cleared by
> intel_pmu_drain_large_pebs() when disabling PEBS event.

When stopping the event.

Then you should only need the check in intel_pmu_drain_pebs_icl(), since
the stop only happens when handling the last event.

Thanks,
Kan>
> 
>>
>> Thanks,
>> Kan>
>>> Thus add NULL check before accessing cpuc->events[*] pointer.
>>>
>>> Reported-by: kernel test robot <oliver.sang@intel.com>
>>> Closes: https://lore.kernel.org/oe-lkp/202507042103.a15d2923-lkp@intel.com
>>> Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group")
>>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>> Tested-by: kernel test robot <oliver.sang@intel.com>
>>> ---
>>>  arch/x86/events/core.c       |  3 +++
>>>  arch/x86/events/intel/core.c |  6 +++++-
>>>  arch/x86/events/intel/ds.c   | 13 ++++++-------
>>>  3 files changed, 14 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>>> index 7610f26dfbd9..f0a3bc57157d 100644
>>> --- a/arch/x86/events/core.c
>>> +++ b/arch/x86/events/core.c
>>> @@ -1711,6 +1711,9 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
>>>  			continue;
>>>  
>>>  		event = cpuc->events[idx];
>>> +		if (!event)
>>> +			continue;
>>> +
>>>  		last_period = event->hw.last_period;
>>>  
>>>  		val = static_call(x86_pmu_update)(event);
>>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>>> index 15da60cf69f2..386717b75a09 100644
>>> --- a/arch/x86/events/intel/core.c
>>> +++ b/arch/x86/events/intel/core.c
>>> @@ -2718,6 +2718,8 @@ static void update_saved_topdown_regs(struct perf_event *event, u64 slots,
>>>  		if (!is_topdown_idx(idx))
>>>  			continue;
>>>  		other = cpuc->events[idx];
>>> +		if (!other)
>>> +			continue;
>>>  		other->hw.saved_slots = slots;
>>>  		other->hw.saved_metric = metrics;
>>>  	}
>>> @@ -2761,6 +2763,8 @@ static u64 intel_update_topdown_event(struct perf_event *event, int metric_end,
>>>  		if (!is_topdown_idx(idx))
>>>  			continue;
>>>  		other = cpuc->events[idx];
>>> +		if (!other)
>>> +			continue;
>>>  		__icl_update_topdown_event(other, slots, metrics,
>>>  					   event ? event->hw.saved_slots : 0,
>>>  					   event ? event->hw.saved_metric : 0);
>>> @@ -3138,7 +3142,7 @@ static void x86_pmu_handle_guest_pebs(struct pt_regs *regs,
>>>  
>>>  	for_each_set_bit(bit, (unsigned long *)&guest_pebs_idxs, X86_PMC_IDX_MAX) {
>>>  		event = cpuc->events[bit];
>>> -		if (!event->attr.precise_ip)
>>> +		if (!event || !event->attr.precise_ip)
>>>  			continue;
>>>  
>>>  		perf_sample_data_init(data, 0, event->hw.last_period);
>>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>>> index c0b7ac1c7594..b23c49e2e06f 100644
>>> --- a/arch/x86/events/intel/ds.c
>>> +++ b/arch/x86/events/intel/ds.c
>>> @@ -2480,6 +2480,8 @@ static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpuc, u64
>>>  	 */
>>>  	for_each_set_bit(bit, (unsigned long *)&pebs_enabled, X86_PMC_IDX_MAX) {
>>>  		event = cpuc->events[bit];
>>> +		if (!event)
>>> +			continue;
>>>  		if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
>>>  			intel_pmu_save_and_restart_reload(event, 0);
>>>  	}
>>> @@ -2579,10 +2581,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
>>>  			continue;
>>>  
>>>  		event = cpuc->events[bit];
>>> -		if (WARN_ON_ONCE(!event))
>>> -			continue;
>>> -
>>> -		if (WARN_ON_ONCE(!event->attr.precise_ip))
>>> +		if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
>>>  			continue;
>>>  
>>>  		/* log dropped samples number */
>>> @@ -2645,9 +2644,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
>>>  		pebs_status = basic->applicable_counters & cpuc->pebs_enabled & mask;
>>>  		for_each_set_bit(bit, (unsigned long *)&pebs_status, X86_PMC_IDX_MAX) {
>>>  			event = cpuc->events[bit];
>>> -
>>> -			if (WARN_ON_ONCE(!event) ||
>>> -			    WARN_ON_ONCE(!event->attr.precise_ip))
>>> +			if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
>>>  				continue;
>>>  
>>>  			if (counts[bit]++) {
>>> @@ -2663,6 +2660,8 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
>>>  			continue;
>>>  
>>>  		event = cpuc->events[bit];
>>> +		if (!event)
>>> +			continue;
>>>  
>>>  		__intel_pmu_pebs_last_event(event, iregs, regs, data, last[bit],
>>>  					    counts[bit], setup_pebs_adaptive_sample_data);
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 3/6] perf/x86: Check if cpuc->events[*] pointer exists before accessing it
  2025-08-12 18:16       ` Liang, Kan
@ 2025-08-19  8:02         ` Mi, Dapeng
  0 siblings, 0 replies; 16+ messages in thread
From: Mi, Dapeng @ 2025-08-19  8:02 UTC (permalink / raw)
  To: Liang, Kan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, kernel test robot


On 8/13/2025 2:16 AM, Liang, Kan wrote:
>
> On 2025-08-11 7:33 p.m., Mi, Dapeng wrote:
>> On 8/12/2025 7:32 AM, Liang, Kan wrote:
>>> On 2025-08-11 2:00 a.m., Dapeng Mi wrote:
>>>> The PMI handler could disable some events as the interrupt throttling
>>>> and clear the corresponding items in cpuc->events[] array.
>>>>
>>>> perf_event_overflow()
>>>>   -> __perf_event_overflow()
>>>>     ->__perf_event_account_interrupt()
>>>>       -> perf_event_throttle_group()
>>>>         -> perf_event_throttle()
>>>>           -> event->pmu->stop()
>>>>             -> x86_pmu_stop()
>>>>
>>>> Moreover PMI is NMI on x86 platform and it could interrupt other perf
>>>> code like setup_pebs_adaptive_sample_data(). 
>>> The PMU is disabled when draining the PEBS records. I don't think a PMI
>>> can be triggered in the setup_pebs_adaptive_sample_data().
>> Besides in NMI handler, the drain_pebs helper intel_pmu_drain_pebs_buffer()
>> could be called in many places, like context switch and PEBS event
>> disabling. 
> Yes
>
>> All these places could be interrupted by the NMI handler, and
> No. Before draining the buffer, the PMU must be stopped. No NMI could be
> triggered.
>
>> then the trigger this NULL pointer access issue.
>>
>>
>>>> So once PMI handling
>>>> finishes and returns into setup_pebs_adaptive_sample_data() and it could
>>>> find the cpuc->events[*] becomes NULL and accessing this NULL pointer
>>>> triggers an invalid memory access and leads to kernel crashes eventually.
>>> The commit 9734e25fbf5a stops all events in a group when processing the
>>> last records of the leader event. For large PEBS, it's possible that
>>> there are still some records for member events left. It should be the
>>> root cause of the NULL pointer. If so, we should drain those records as
>>> well.
>> The left PEBS record would always be cleared by
>> intel_pmu_drain_large_pebs() when disabling PEBS event.
> When stopping the event.
>
> Then you should only need the check in intel_pmu_drain_pebs_icl(), since
> the stop only happens when handling the last event.

Yes, it's enough to only add NULL check in intel_pmu_drain_pebs_icl(). Thanks.


>
> Thanks,
> Kan>
>>> Thanks,
>>> Kan>
>>>> Thus add NULL check before accessing cpuc->events[*] pointer.
>>>>
>>>> Reported-by: kernel test robot <oliver.sang@intel.com>
>>>> Closes: https://lore.kernel.org/oe-lkp/202507042103.a15d2923-lkp@intel.com
>>>> Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group")
>>>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>>> Tested-by: kernel test robot <oliver.sang@intel.com>
>>>> ---
>>>>  arch/x86/events/core.c       |  3 +++
>>>>  arch/x86/events/intel/core.c |  6 +++++-
>>>>  arch/x86/events/intel/ds.c   | 13 ++++++-------
>>>>  3 files changed, 14 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>>>> index 7610f26dfbd9..f0a3bc57157d 100644
>>>> --- a/arch/x86/events/core.c
>>>> +++ b/arch/x86/events/core.c
>>>> @@ -1711,6 +1711,9 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
>>>>  			continue;
>>>>  
>>>>  		event = cpuc->events[idx];
>>>> +		if (!event)
>>>> +			continue;
>>>> +
>>>>  		last_period = event->hw.last_period;
>>>>  
>>>>  		val = static_call(x86_pmu_update)(event);
>>>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>>>> index 15da60cf69f2..386717b75a09 100644
>>>> --- a/arch/x86/events/intel/core.c
>>>> +++ b/arch/x86/events/intel/core.c
>>>> @@ -2718,6 +2718,8 @@ static void update_saved_topdown_regs(struct perf_event *event, u64 slots,
>>>>  		if (!is_topdown_idx(idx))
>>>>  			continue;
>>>>  		other = cpuc->events[idx];
>>>> +		if (!other)
>>>> +			continue;
>>>>  		other->hw.saved_slots = slots;
>>>>  		other->hw.saved_metric = metrics;
>>>>  	}
>>>> @@ -2761,6 +2763,8 @@ static u64 intel_update_topdown_event(struct perf_event *event, int metric_end,
>>>>  		if (!is_topdown_idx(idx))
>>>>  			continue;
>>>>  		other = cpuc->events[idx];
>>>> +		if (!other)
>>>> +			continue;
>>>>  		__icl_update_topdown_event(other, slots, metrics,
>>>>  					   event ? event->hw.saved_slots : 0,
>>>>  					   event ? event->hw.saved_metric : 0);
>>>> @@ -3138,7 +3142,7 @@ static void x86_pmu_handle_guest_pebs(struct pt_regs *regs,
>>>>  
>>>>  	for_each_set_bit(bit, (unsigned long *)&guest_pebs_idxs, X86_PMC_IDX_MAX) {
>>>>  		event = cpuc->events[bit];
>>>> -		if (!event->attr.precise_ip)
>>>> +		if (!event || !event->attr.precise_ip)
>>>>  			continue;
>>>>  
>>>>  		perf_sample_data_init(data, 0, event->hw.last_period);
>>>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>>>> index c0b7ac1c7594..b23c49e2e06f 100644
>>>> --- a/arch/x86/events/intel/ds.c
>>>> +++ b/arch/x86/events/intel/ds.c
>>>> @@ -2480,6 +2480,8 @@ static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpuc, u64
>>>>  	 */
>>>>  	for_each_set_bit(bit, (unsigned long *)&pebs_enabled, X86_PMC_IDX_MAX) {
>>>>  		event = cpuc->events[bit];
>>>> +		if (!event)
>>>> +			continue;
>>>>  		if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
>>>>  			intel_pmu_save_and_restart_reload(event, 0);
>>>>  	}
>>>> @@ -2579,10 +2581,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
>>>>  			continue;
>>>>  
>>>>  		event = cpuc->events[bit];
>>>> -		if (WARN_ON_ONCE(!event))
>>>> -			continue;
>>>> -
>>>> -		if (WARN_ON_ONCE(!event->attr.precise_ip))
>>>> +		if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
>>>>  			continue;
>>>>  
>>>>  		/* log dropped samples number */
>>>> @@ -2645,9 +2644,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
>>>>  		pebs_status = basic->applicable_counters & cpuc->pebs_enabled & mask;
>>>>  		for_each_set_bit(bit, (unsigned long *)&pebs_status, X86_PMC_IDX_MAX) {
>>>>  			event = cpuc->events[bit];
>>>> -
>>>> -			if (WARN_ON_ONCE(!event) ||
>>>> -			    WARN_ON_ONCE(!event->attr.precise_ip))
>>>> +			if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
>>>>  				continue;
>>>>  
>>>>  			if (counts[bit]++) {
>>>> @@ -2663,6 +2660,8 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
>>>>  			continue;
>>>>  
>>>>  		event = cpuc->events[bit];
>>>> +		if (!event)
>>>> +			continue;
>>>>  
>>>>  		__intel_pmu_pebs_last_event(event, iregs, regs, data, last[bit],
>>>>  					    counts[bit], setup_pebs_adaptive_sample_data);
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 3/6] perf/x86: Check if cpuc->events[*] pointer exists before accessing it
  2025-08-11  9:00 ` [Patch v2 3/6] perf/x86: Check if cpuc->events[*] pointer exists before accessing it Dapeng Mi
  2025-08-11 23:32   ` Liang, Kan
@ 2025-08-19  8:45   ` Peter Zijlstra
  2025-08-19  9:21     ` Mi, Dapeng
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2025-08-19  8:45 UTC (permalink / raw)
  To: Dapeng Mi
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Alexander Shishkin, Kan Liang, Andi Kleen,
	Eranian Stephane, linux-kernel, linux-perf-users, Dapeng Mi,
	kernel test robot

On Mon, Aug 11, 2025 at 05:00:31PM +0800, Dapeng Mi wrote:
> The PMI handler could disable some events as the interrupt throttling
> and clear the corresponding items in cpuc->events[] array.
> 
> perf_event_overflow()
>   -> __perf_event_overflow()
>     ->__perf_event_account_interrupt()
>       -> perf_event_throttle_group()
>         -> perf_event_throttle()
>           -> event->pmu->stop()
>             -> x86_pmu_stop()
> 
> Moreover PMI is NMI on x86 platform and it could interrupt other perf
> code like setup_pebs_adaptive_sample_data(). 

Uhh, how? AFAICT we only do drain_pebs() from the PMI itself, or disable
the PMU first by clearing GLOBAL_CTRL.

> So once PMI handling
> finishes and returns into setup_pebs_adaptive_sample_data() and it could
> find the cpuc->events[*] becomes NULL and accessing this NULL pointer
> triggers an invalid memory access and leads to kernel crashes eventually.
> 
> Thus add NULL check before accessing cpuc->events[*] pointer.

This doesn't seem fully thought through.

If we do this NULL check, then the active_mask bittest is completely
superfluous and can be removed, no?

Also, what about this race:

	event = cpuc->events[idx]; // !NULL;
	<PMI>
		x86_pmu_stop()
		  cpuc->events[idx] = NULL;
	</PMI>
	... uses event

Worse, since it is a 'normal' load, it is permitted for the compiler to
re-issue the load, at which point it will still explode. IOW, it should
be READ_ONCE(), *if* we can live with the above race at all. Can we?

First though, you need to explain how we get here. Because drain_pebs()
nesting would be *BAD*.

> 
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202507042103.a15d2923-lkp@intel.com
> Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group")
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Tested-by: kernel test robot <oliver.sang@intel.com>
> ---
>  arch/x86/events/core.c       |  3 +++
>  arch/x86/events/intel/core.c |  6 +++++-
>  arch/x86/events/intel/ds.c   | 13 ++++++-------
>  3 files changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 7610f26dfbd9..f0a3bc57157d 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1711,6 +1711,9 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
>  			continue;
>  
>  		event = cpuc->events[idx];
> +		if (!event)
> +			continue;
> +
>  		last_period = event->hw.last_period;
>  
>  		val = static_call(x86_pmu_update)(event);
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 15da60cf69f2..386717b75a09 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2718,6 +2718,8 @@ static void update_saved_topdown_regs(struct perf_event *event, u64 slots,
>  		if (!is_topdown_idx(idx))
>  			continue;
>  		other = cpuc->events[idx];
> +		if (!other)
> +			continue;
>  		other->hw.saved_slots = slots;
>  		other->hw.saved_metric = metrics;
>  	}
> @@ -2761,6 +2763,8 @@ static u64 intel_update_topdown_event(struct perf_event *event, int metric_end,
>  		if (!is_topdown_idx(idx))
>  			continue;
>  		other = cpuc->events[idx];
> +		if (!other)
> +			continue;
>  		__icl_update_topdown_event(other, slots, metrics,
>  					   event ? event->hw.saved_slots : 0,
>  					   event ? event->hw.saved_metric : 0);
> @@ -3138,7 +3142,7 @@ static void x86_pmu_handle_guest_pebs(struct pt_regs *regs,
>  
>  	for_each_set_bit(bit, (unsigned long *)&guest_pebs_idxs, X86_PMC_IDX_MAX) {
>  		event = cpuc->events[bit];
> -		if (!event->attr.precise_ip)
> +		if (!event || !event->attr.precise_ip)
>  			continue;
>  
>  		perf_sample_data_init(data, 0, event->hw.last_period);
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index c0b7ac1c7594..b23c49e2e06f 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -2480,6 +2480,8 @@ static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpuc, u64
>  	 */
>  	for_each_set_bit(bit, (unsigned long *)&pebs_enabled, X86_PMC_IDX_MAX) {
>  		event = cpuc->events[bit];
> +		if (!event)
> +			continue;
>  		if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
>  			intel_pmu_save_and_restart_reload(event, 0);
>  	}
> @@ -2579,10 +2581,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
>  			continue;
>  
>  		event = cpuc->events[bit];
> -		if (WARN_ON_ONCE(!event))
> -			continue;
> -
> -		if (WARN_ON_ONCE(!event->attr.precise_ip))
> +		if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
>  			continue;
>  
>  		/* log dropped samples number */
> @@ -2645,9 +2644,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
>  		pebs_status = basic->applicable_counters & cpuc->pebs_enabled & mask;
>  		for_each_set_bit(bit, (unsigned long *)&pebs_status, X86_PMC_IDX_MAX) {
>  			event = cpuc->events[bit];
> -
> -			if (WARN_ON_ONCE(!event) ||
> -			    WARN_ON_ONCE(!event->attr.precise_ip))
> +			if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
>  				continue;
>  
>  			if (counts[bit]++) {
> @@ -2663,6 +2660,8 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
>  			continue;
>  
>  		event = cpuc->events[bit];
> +		if (!event)
> +			continue;
>  
>  		__intel_pmu_pebs_last_event(event, iregs, regs, data, last[bit],
>  					    counts[bit], setup_pebs_adaptive_sample_data);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 3/6] perf/x86: Check if cpuc->events[*] pointer exists before accessing it
  2025-08-19  8:45   ` Peter Zijlstra
@ 2025-08-19  9:21     ` Mi, Dapeng
  2025-08-19  9:26       ` Mi, Dapeng
  0 siblings, 1 reply; 16+ messages in thread
From: Mi, Dapeng @ 2025-08-19  9:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Alexander Shishkin, Kan Liang, Andi Kleen,
	Eranian Stephane, linux-kernel, linux-perf-users, Dapeng Mi,
	kernel test robot


On 8/19/2025 4:45 PM, Peter Zijlstra wrote:
> On Mon, Aug 11, 2025 at 05:00:31PM +0800, Dapeng Mi wrote:
>> The PMI handler could disable some events as the interrupt throttling
>> and clear the corresponding items in cpuc->events[] array.
>>
>> perf_event_overflow()
>>   -> __perf_event_overflow()
>>     ->__perf_event_account_interrupt()
>>       -> perf_event_throttle_group()
>>         -> perf_event_throttle()
>>           -> event->pmu->stop()
>>             -> x86_pmu_stop()
>>
>> Moreover PMI is NMI on x86 platform and it could interrupt other perf
>> code like setup_pebs_adaptive_sample_data(). 
> Uhh, how? AFAICT we only do drain_pebs() from the PMI itself, or disable
> the PMU first by clearing GLOBAL_CTRL.
>
>> So once PMI handling
>> finishes and returns into setup_pebs_adaptive_sample_data() and it could
>> find the cpuc->events[*] becomes NULL and accessing this NULL pointer
>> triggers an invalid memory access and leads to kernel crashes eventually.
>>
>> Thus add NULL check before accessing cpuc->events[*] pointer.
> This doesn't seem fully thought through.
>
> If we do this NULL check, then the active_mask bittest is completely
> superfluous and can be removed, no?
>
> Also, what about this race:
>
> 	event = cpuc->events[idx]; // !NULL;
> 	<PMI>
> 		x86_pmu_stop()
> 		  cpuc->events[idx] = NULL;
> 	</PMI>
> 	... uses event
>
> Worse, since it is a 'normal' load, it is permitted for the compiler to
> re-issue the load, at which point it will still explode. IOW, it should
> be READ_ONCE(), *if* we can live with the above race at all. Can we?
>
> First though, you need to explain how we get here. Because drain_pebs()
> nesting would be *BAD*.

I suppose I made a mistake on explaining why the issue happens. Since I
can't reproduce this issue locally (Synced with Oliver and the issue seems
can be produced on a specific SPR model), I can only guess the root case. I
originally thought drain_pebs() helper could be interrupted by PMI and then
cause the issue, but as Kan said, it's not true as PMU is always disabled
before draining PEBS buffer.

So after thinking twice,  I suppose the reason should be

    When intel_pmu_drain_pebs_icl() is called to drain PEBS records, the
    perf_event_overflow() could be called to process the last PEBS record.

    While perf_event_overflow() could trigger the interrupt throttle and
    stop all events of the group, like what the below call-chain shows.

    perf_event_overflow()
      -> __perf_event_overflow()
        ->__perf_event_account_interrupt()
          -> perf_event_throttle_group()
            -> perf_event_throttle()
              -> event->pmu->stop()
                -> x86_pmu_stop()

    The side effect of stopping the events is that all corresponding event
    pointers in cpuc->events[] array are cleared to NULL.

    Assume there are two PEBS events (event a and event b) in a group. When
    intel_pmu_drain_pebs_icl() calls perf_event_overflow() to process the
    last PEBS record of PEBS event a, interrupt throttle is triggered and
    all pointers of event a and event b are cleared to NULL. Then
    intel_pmu_drain_pebs_icl() tries to process the last PEBS record of
    event b and encounters NULL pointer access.

I would cook a v3 patch to update the commit message and code. Thanks.


>
>> Reported-by: kernel test robot <oliver.sang@intel.com>
>> Closes: https://lore.kernel.org/oe-lkp/202507042103.a15d2923-lkp@intel.com
>> Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group")
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> Tested-by: kernel test robot <oliver.sang@intel.com>
>> ---
>>  arch/x86/events/core.c       |  3 +++
>>  arch/x86/events/intel/core.c |  6 +++++-
>>  arch/x86/events/intel/ds.c   | 13 ++++++-------
>>  3 files changed, 14 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index 7610f26dfbd9..f0a3bc57157d 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -1711,6 +1711,9 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
>>  			continue;
>>  
>>  		event = cpuc->events[idx];
>> +		if (!event)
>> +			continue;
>> +
>>  		last_period = event->hw.last_period;
>>  
>>  		val = static_call(x86_pmu_update)(event);
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 15da60cf69f2..386717b75a09 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -2718,6 +2718,8 @@ static void update_saved_topdown_regs(struct perf_event *event, u64 slots,
>>  		if (!is_topdown_idx(idx))
>>  			continue;
>>  		other = cpuc->events[idx];
>> +		if (!other)
>> +			continue;
>>  		other->hw.saved_slots = slots;
>>  		other->hw.saved_metric = metrics;
>>  	}
>> @@ -2761,6 +2763,8 @@ static u64 intel_update_topdown_event(struct perf_event *event, int metric_end,
>>  		if (!is_topdown_idx(idx))
>>  			continue;
>>  		other = cpuc->events[idx];
>> +		if (!other)
>> +			continue;
>>  		__icl_update_topdown_event(other, slots, metrics,
>>  					   event ? event->hw.saved_slots : 0,
>>  					   event ? event->hw.saved_metric : 0);
>> @@ -3138,7 +3142,7 @@ static void x86_pmu_handle_guest_pebs(struct pt_regs *regs,
>>  
>>  	for_each_set_bit(bit, (unsigned long *)&guest_pebs_idxs, X86_PMC_IDX_MAX) {
>>  		event = cpuc->events[bit];
>> -		if (!event->attr.precise_ip)
>> +		if (!event || !event->attr.precise_ip)
>>  			continue;
>>  
>>  		perf_sample_data_init(data, 0, event->hw.last_period);
>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>> index c0b7ac1c7594..b23c49e2e06f 100644
>> --- a/arch/x86/events/intel/ds.c
>> +++ b/arch/x86/events/intel/ds.c
>> @@ -2480,6 +2480,8 @@ static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpuc, u64
>>  	 */
>>  	for_each_set_bit(bit, (unsigned long *)&pebs_enabled, X86_PMC_IDX_MAX) {
>>  		event = cpuc->events[bit];
>> +		if (!event)
>> +			continue;
>>  		if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
>>  			intel_pmu_save_and_restart_reload(event, 0);
>>  	}
>> @@ -2579,10 +2581,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
>>  			continue;
>>  
>>  		event = cpuc->events[bit];
>> -		if (WARN_ON_ONCE(!event))
>> -			continue;
>> -
>> -		if (WARN_ON_ONCE(!event->attr.precise_ip))
>> +		if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
>>  			continue;
>>  
>>  		/* log dropped samples number */
>> @@ -2645,9 +2644,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
>>  		pebs_status = basic->applicable_counters & cpuc->pebs_enabled & mask;
>>  		for_each_set_bit(bit, (unsigned long *)&pebs_status, X86_PMC_IDX_MAX) {
>>  			event = cpuc->events[bit];
>> -
>> -			if (WARN_ON_ONCE(!event) ||
>> -			    WARN_ON_ONCE(!event->attr.precise_ip))
>> +			if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
>>  				continue;
>>  
>>  			if (counts[bit]++) {
>> @@ -2663,6 +2660,8 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
>>  			continue;
>>  
>>  		event = cpuc->events[bit];
>> +		if (!event)
>> +			continue;
>>  
>>  		__intel_pmu_pebs_last_event(event, iregs, regs, data, last[bit],
>>  					    counts[bit], setup_pebs_adaptive_sample_data);
>> -- 
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 3/6] perf/x86: Check if cpuc->events[*] pointer exists before accessing it
  2025-08-19  9:21     ` Mi, Dapeng
@ 2025-08-19  9:26       ` Mi, Dapeng
  0 siblings, 0 replies; 16+ messages in thread
From: Mi, Dapeng @ 2025-08-19  9:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Alexander Shishkin, Kan Liang, Andi Kleen,
	Eranian Stephane, linux-kernel, linux-perf-users, Dapeng Mi,
	kernel test robot


On 8/19/2025 5:21 PM, Mi, Dapeng wrote:
> On 8/19/2025 4:45 PM, Peter Zijlstra wrote:
>> On Mon, Aug 11, 2025 at 05:00:31PM +0800, Dapeng Mi wrote:
>>> The PMI handler could disable some events as the interrupt throttling
>>> and clear the corresponding items in cpuc->events[] array.
>>>
>>> perf_event_overflow()
>>>   -> __perf_event_overflow()
>>>     ->__perf_event_account_interrupt()
>>>       -> perf_event_throttle_group()
>>>         -> perf_event_throttle()
>>>           -> event->pmu->stop()
>>>             -> x86_pmu_stop()
>>>
>>> Moreover PMI is NMI on x86 platform and it could interrupt other perf
>>> code like setup_pebs_adaptive_sample_data(). 
>> Uhh, how? AFAICT we only do drain_pebs() from the PMI itself, or disable
>> the PMU first by clearing GLOBAL_CTRL.
>>
>>> So once PMI handling
>>> finishes and returns into setup_pebs_adaptive_sample_data() and it could
>>> find the cpuc->events[*] becomes NULL and accessing this NULL pointer
>>> triggers an invalid memory access and leads to kernel crashes eventually.
>>>
>>> Thus add NULL check before accessing cpuc->events[*] pointer.
>> This doesn't seem fully thought through.
>>
>> If we do this NULL check, then the active_mask bittest is completely
>> superfluous and can be removed, no?
>>
>> Also, what about this race:
>>
>> 	event = cpuc->events[idx]; // !NULL;
>> 	<PMI>
>> 		x86_pmu_stop()
>> 		  cpuc->events[idx] = NULL;
>> 	</PMI>
>> 	... uses event
>>
>> Worse, since it is a 'normal' load, it is permitted for the compiler to
>> re-issue the load, at which point it will still explode. IOW, it should
>> be READ_ONCE(), *if* we can live with the above race at all. Can we?
>>
>> First though, you need to explain how we get here. Because drain_pebs()
>> nesting would be *BAD*.
> I suppose I made a mistake on explaining why the issue happens. Since I
> can't reproduce this issue locally (Synced with Oliver and the issue seems
> can be produced on a specific SPR model), I can only guess the root case. I
> originally thought drain_pebs() helper could be interrupted by PMI and then
> cause the issue, but as Kan said, it's not true as PMU is always disabled
> before draining PEBS buffer.
>
> So after thinking twice,  I suppose the reason should be
>
>     When intel_pmu_drain_pebs_icl() is called to drain PEBS records, the
>     perf_event_overflow() could be called to process the last PEBS record.
>
>     While perf_event_overflow() could trigger the interrupt throttle and
>     stop all events of the group, like what the below call-chain shows.
>
>     perf_event_overflow()
>       -> __perf_event_overflow()
>         ->__perf_event_account_interrupt()
>           -> perf_event_throttle_group()
>             -> perf_event_throttle()
>               -> event->pmu->stop()
>                 -> x86_pmu_stop()
>
>     The side effect of stopping the events is that all corresponding event
>     pointers in cpuc->events[] array are cleared to NULL.
>
>     Assume there are two PEBS events (event a and event b) in a group. When
>     intel_pmu_drain_pebs_icl() calls perf_event_overflow() to process the
>     last PEBS record of PEBS event a, interrupt throttle is triggered and
>     all pointers of event a and event b are cleared to NULL. Then
>     intel_pmu_drain_pebs_icl() tries to process the last PEBS record of
>     event b and encounters NULL pointer access.
>
> I would cook a v3 patch to update the commit message and code. Thanks.

Forgot to say, for this scenario, suppose we can directly skip to process
the last PEBS record if the event pointer is NULL since the last PEBS
record has been processed when stopping the event previously.


>
>
>>> Reported-by: kernel test robot <oliver.sang@intel.com>
>>> Closes: https://lore.kernel.org/oe-lkp/202507042103.a15d2923-lkp@intel.com
>>> Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group")
>>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>> Tested-by: kernel test robot <oliver.sang@intel.com>
>>> ---
>>>  arch/x86/events/core.c       |  3 +++
>>>  arch/x86/events/intel/core.c |  6 +++++-
>>>  arch/x86/events/intel/ds.c   | 13 ++++++-------
>>>  3 files changed, 14 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>>> index 7610f26dfbd9..f0a3bc57157d 100644
>>> --- a/arch/x86/events/core.c
>>> +++ b/arch/x86/events/core.c
>>> @@ -1711,6 +1711,9 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
>>>  			continue;
>>>  
>>>  		event = cpuc->events[idx];
>>> +		if (!event)
>>> +			continue;
>>> +
>>>  		last_period = event->hw.last_period;
>>>  
>>>  		val = static_call(x86_pmu_update)(event);
>>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>>> index 15da60cf69f2..386717b75a09 100644
>>> --- a/arch/x86/events/intel/core.c
>>> +++ b/arch/x86/events/intel/core.c
>>> @@ -2718,6 +2718,8 @@ static void update_saved_topdown_regs(struct perf_event *event, u64 slots,
>>>  		if (!is_topdown_idx(idx))
>>>  			continue;
>>>  		other = cpuc->events[idx];
>>> +		if (!other)
>>> +			continue;
>>>  		other->hw.saved_slots = slots;
>>>  		other->hw.saved_metric = metrics;
>>>  	}
>>> @@ -2761,6 +2763,8 @@ static u64 intel_update_topdown_event(struct perf_event *event, int metric_end,
>>>  		if (!is_topdown_idx(idx))
>>>  			continue;
>>>  		other = cpuc->events[idx];
>>> +		if (!other)
>>> +			continue;
>>>  		__icl_update_topdown_event(other, slots, metrics,
>>>  					   event ? event->hw.saved_slots : 0,
>>>  					   event ? event->hw.saved_metric : 0);
>>> @@ -3138,7 +3142,7 @@ static void x86_pmu_handle_guest_pebs(struct pt_regs *regs,
>>>  
>>>  	for_each_set_bit(bit, (unsigned long *)&guest_pebs_idxs, X86_PMC_IDX_MAX) {
>>>  		event = cpuc->events[bit];
>>> -		if (!event->attr.precise_ip)
>>> +		if (!event || !event->attr.precise_ip)
>>>  			continue;
>>>  
>>>  		perf_sample_data_init(data, 0, event->hw.last_period);
>>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>>> index c0b7ac1c7594..b23c49e2e06f 100644
>>> --- a/arch/x86/events/intel/ds.c
>>> +++ b/arch/x86/events/intel/ds.c
>>> @@ -2480,6 +2480,8 @@ static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpuc, u64
>>>  	 */
>>>  	for_each_set_bit(bit, (unsigned long *)&pebs_enabled, X86_PMC_IDX_MAX) {
>>>  		event = cpuc->events[bit];
>>> +		if (!event)
>>> +			continue;
>>>  		if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
>>>  			intel_pmu_save_and_restart_reload(event, 0);
>>>  	}
>>> @@ -2579,10 +2581,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
>>>  			continue;
>>>  
>>>  		event = cpuc->events[bit];
>>> -		if (WARN_ON_ONCE(!event))
>>> -			continue;
>>> -
>>> -		if (WARN_ON_ONCE(!event->attr.precise_ip))
>>> +		if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
>>>  			continue;
>>>  
>>>  		/* log dropped samples number */
>>> @@ -2645,9 +2644,7 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
>>>  		pebs_status = basic->applicable_counters & cpuc->pebs_enabled & mask;
>>>  		for_each_set_bit(bit, (unsigned long *)&pebs_status, X86_PMC_IDX_MAX) {
>>>  			event = cpuc->events[bit];
>>> -
>>> -			if (WARN_ON_ONCE(!event) ||
>>> -			    WARN_ON_ONCE(!event->attr.precise_ip))
>>> +			if (!event || WARN_ON_ONCE(!event->attr.precise_ip))
>>>  				continue;
>>>  
>>>  			if (counts[bit]++) {
>>> @@ -2663,6 +2660,8 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
>>>  			continue;
>>>  
>>>  		event = cpuc->events[bit];
>>> +		if (!event)
>>> +			continue;
>>>  
>>>  		__intel_pmu_pebs_last_event(event, iregs, regs, data, last[bit],
>>>  					    counts[bit], setup_pebs_adaptive_sample_data);
>>> -- 
>>> 2.34.1
>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-08-19  9:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-11  9:00 [Patch v2 0/6] x86 perf bug fixes and optimization Dapeng Mi
2025-08-11  9:00 ` [Patch v2 1/6] perf/x86/intel: Use early_initcall() to hook bts_init() Dapeng Mi
2025-08-11  9:00 ` [Patch v2 2/6] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Dapeng Mi
2025-08-11  9:00 ` [Patch v2 3/6] perf/x86: Check if cpuc->events[*] pointer exists before accessing it Dapeng Mi
2025-08-11 23:32   ` Liang, Kan
2025-08-12  2:33     ` Mi, Dapeng
2025-08-12 18:16       ` Liang, Kan
2025-08-19  8:02         ` Mi, Dapeng
2025-08-19  8:45   ` Peter Zijlstra
2025-08-19  9:21     ` Mi, Dapeng
2025-08-19  9:26       ` Mi, Dapeng
2025-08-11  9:00 ` [Patch v2 4/6] perf/x86: Add PERF_CAP_PEBS_TIMING_INFO flag Dapeng Mi
2025-08-11  9:00 ` [Patch v2 5/6] perf/x86/intel: Change macro GLOBAL_CTRL_EN_PERF_METRICS to BIT_ULL(48) Dapeng Mi
2025-08-11  9:00 ` [Patch v2 6/6] perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASK Dapeng Mi
2025-08-12  0:00   ` Liang, Kan
2025-08-12  2:54     ` Mi, Dapeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).