* [PATCH] perf/x86/intel/ds: Fix loop ordering in release_ds_buffers()
@ 2026-05-13 15:43 Rik van Riel
0 siblings, 0 replies; only message in thread
From: Rik van Riel @ 2026-05-13 15:43 UTC (permalink / raw)
To: linux-kernel
Cc: x86, kernel-team, Mi, Dapeng, Peter Zijlstra, Ingo Molnar,
Dave Hansen
release_ds_buffers() has three loops:
1. release_ds_buffer() - NULLs hwev->ds for each CPU
2. fini_debug_store_on_cpu() - clears MSR_IA32_DS_AREA
3. release_pebs_buffer()/release_bts_buffer() - unmaps CEA pages and
frees backing pages
The problem is that fini_debug_store_on_cpu() checks if hwev->ds is
NULL and returns early if so. Since loop 1 already NULLed hwev->ds,
loop 2 never actually clears MSR_IA32_DS_AREA. Then loop 3 unmaps the
CEA pages, leaving the MSR pointing at now-unmapped memory. When a PEBS
overflow fires, the hardware writes to unmapped pages, causing page
faults in random victim code.
Fix by calling fini_debug_store_on_cpu() BEFORE release_ds_buffer(), so the
MSR is cleared while hwev->ds is still valid.
Observed crash signature:
BUG: unable to handle kernel paging request in __lookup_object
CR2: fffffe00004b7028 (CEA range)
RIP: __lookup_object+0x39 (cmp %rdi,%rax -- register-only, can't fault)
Secondary: TASK stack guard page hit (recursive page fault overflow)
Assisted-by: Claude:claude-opus-4.7 syzkaller
Signed-off-by: Rik van Riel <riel@surriel.com>
---
arch/x86/events/intel/ds.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 2abfeb4e2908..85894673f03b 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -973,18 +973,25 @@ void release_ds_buffers(void)
if (!x86_pmu.bts && !x86_pmu.ds_pebs)
return;
- for_each_possible_cpu(cpu)
- release_ds_buffer(cpu);
-
for_each_possible_cpu(cpu) {
/*
- * Again, ignore errors from offline CPUs, they will no longer
- * observe cpu_hw_events.ds and not program the DS_AREA when
- * they come up.
+ * Clear MSR_IA32_DS_AREA BEFORE NULLing hwev->ds.
+ * fini_debug_store_on_cpu() checks hwev->ds and bails
+ * if it's NULL, so calling release_ds_buffer() first
+ * would prevent the MSR from being cleared. That leaves
+ * the hardware writing into CEA pages that get unmapped
+ * below, causing asynchronous page faults at random RIPs.
+ *
+ * Ignore errors from offline CPUs, they will no longer
+ * observe cpu_hw_events.ds and not program the DS_AREA
+ * when they come up.
*/
fini_debug_store_on_cpu(cpu);
}
+ for_each_possible_cpu(cpu)
+ release_ds_buffer(cpu);
+
for_each_possible_cpu(cpu) {
if (x86_pmu.ds_pebs)
release_pebs_buffer(cpu);
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2026-05-13 15:43 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 15:43 [PATCH] perf/x86/intel/ds: Fix loop ordering in release_ds_buffers() Rik van Riel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox