* [PATCH] perf/x86: Restore event pointer setup in x86_pmu_start()
@ 2026-03-09 14:40 Breno Leitao
2026-03-09 16:38 ` Peter Zijlstra
2026-03-10 1:45 ` Mi, Dapeng
0 siblings, 2 replies; 4+ messages in thread
From: Breno Leitao @ 2026-03-09 14:40 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, James Clark, Thomas Gleixner,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Dapeng Mi
Cc: linux-perf-users, linux-kernel, kernel-team, Breno Leitao
A production AMD EPYC system crashed with a NULL pointer dereference
in the PMU NMI handler:
BUG: kernel NULL pointer dereference, address: 0000000000000198
RIP: x86_perf_event_update+0xc/0xa0
Call Trace:
<NMI>
amd_pmu_v2_handle_irq+0x1a6/0x390
perf_event_nmi_handler+0x24/0x40
The faulting instruction is `cmpq $0x0, 0x198(%rdi)` with RDI=0,
corresponding to the `if (unlikely(!hwc->event_base))` check in
x86_perf_event_update() where hwc = &event->hw and event is NULL.
drgn inspection of the vmcore on CPU 106 showed a mismatch between
cpuc->active_mask and cpuc->events[]:
active_mask: 0x1e (bits 1, 2, 3, 4)
events[1]: 0xff1100136cbd4f38 (valid)
events[2]: 0x0 (NULL, but active_mask bit 2 set)
events[3]: 0xff1100076fd2cf38 (valid)
events[4]: 0xff1100079e990a90 (valid)
The event that should occupy events[2] was found in event_list[2]
with hw.idx=2 and hw.state=0x0, confirming x86_pmu_start() had run
(which clears hw.state and sets active_mask) but events[2] was
never populated.
Another event (event_list[0]) had hw.state=0x7 (STOPPED|UPTODATE|ARCH),
showing it was stopped when the PMU rescheduled events, confirming the
throttle-then-reschedule sequence occurred.
The root cause is commit 7e772a93eb61 ("perf/x86: Fix NULL event access
and potential PEBS record loss") which moved the cpuc->events[idx]
assignment out of x86_pmu_start() and into x86_pmu_enable(). This
broke any path that calls pmu->start() without going through
x86_pmu_enable() -- specifically the unthrottle path:
perf_adjust_freq_unthr_events()
-> perf_event_unthrottle_group()
-> perf_event_unthrottle()
-> event->pmu->start(event, 0)
-> x86_pmu_start() // sets active_mask but not events[]
The race sequence is:
1. A group of perf events overflows, triggering group throttle via
perf_event_throttle_group(). All events are stopped: active_mask
bits cleared, events[] preserved (x86_pmu_stop no longer clears
events[] after commit 7e772a93eb61).
2. While still throttled (PERF_HES_STOPPED), x86_pmu_enable() runs
due to other scheduling activity. Stopped events that need to
move counters get PERF_HES_ARCH set and events[old_idx] cleared.
In step 2 of x86_pmu_enable(), PERF_HES_ARCH causes these events
to be skipped -- events[new_idx] is never set.
3. The timer tick unthrottles the group via pmu->start(). Since
commit 7e772a93eb61 removed the events[] assignment from
x86_pmu_start(), active_mask[new_idx] is set but events[new_idx]
remains NULL.
4. A PMC overflow NMI fires. The handler iterates active counters,
finds active_mask[2] set, reads events[2] which is NULL, and
crashes dereferencing it.
Restore cpuc->events[idx] = event in x86_pmu_start() so that every
caller of pmu->start() correctly populates events[] before setting
active_mask. This does not reintroduce the PEBS issue that commit
7e772a93eb61 fixed, because that fix also moved the events[] = NULL
clearing from x86_pmu_stop() to x86_pmu_del() -- throttle/unthrottle
cycles no longer clear events[].
Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
arch/x86/events/core.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 03ce1bc7ef2ea..fd82d1427b335 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1546,6 +1546,11 @@ static void x86_pmu_start(struct perf_event *event, int flags)
event->hw.state = 0;
+ /*
+ * Ensure events[idx] is set before active_mask, so NMI handlers
+ * never see an active counter with a NULL event pointer.
+ */
+ cpuc->events[idx] = event;
__set_bit(idx, cpuc->active_mask);
static_call(x86_pmu_enable)(event);
perf_event_update_userpage(event);
---
base-commit: 0bcac7b11262557c990da1ac564d45777eb6b005
change-id: 20260309-perf-fd32da0317a8
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] perf/x86: Restore event pointer setup in x86_pmu_start()
2026-03-09 14:40 [PATCH] perf/x86: Restore event pointer setup in x86_pmu_start() Breno Leitao
@ 2026-03-09 16:38 ` Peter Zijlstra
2026-03-09 17:00 ` Breno Leitao
2026-03-10 1:45 ` Mi, Dapeng
1 sibling, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2026-03-09 16:38 UTC (permalink / raw)
To: Breno Leitao
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Dapeng Mi, linux-perf-users, linux-kernel,
kernel-team
On Mon, Mar 09, 2026 at 07:40:56AM -0700, Breno Leitao wrote:
> A production AMD EPYC system crashed with a NULL pointer dereference
> in the PMU NMI handler:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000198
> RIP: x86_perf_event_update+0xc/0xa0
> Call Trace:
> <NMI>
> amd_pmu_v2_handle_irq+0x1a6/0x390
> perf_event_nmi_handler+0x24/0x40
>
> The faulting instruction is `cmpq $0x0, 0x198(%rdi)` with RDI=0,
> corresponding to the `if (unlikely(!hwc->event_base))` check in
> x86_perf_event_update() where hwc = &event->hw and event is NULL.
>
> drgn inspection of the vmcore on CPU 106 showed a mismatch between
> cpuc->active_mask and cpuc->events[]:
>
> active_mask: 0x1e (bits 1, 2, 3, 4)
> events[1]: 0xff1100136cbd4f38 (valid)
> events[2]: 0x0 (NULL, but active_mask bit 2 set)
> events[3]: 0xff1100076fd2cf38 (valid)
> events[4]: 0xff1100079e990a90 (valid)
>
> The event that should occupy events[2] was found in event_list[2]
> with hw.idx=2 and hw.state=0x0, confirming x86_pmu_start() had run
> (which clears hw.state and sets active_mask) but events[2] was
> never populated.
>
> Another event (event_list[0]) had hw.state=0x7 (STOPPED|UPTODATE|ARCH),
> showing it was stopped when the PMU rescheduled events, confirming the
> throttle-then-reschedule sequence occurred.
>
> The root cause is commit 7e772a93eb61 ("perf/x86: Fix NULL event access
> and potential PEBS record loss") which moved the cpuc->events[idx]
> assignment out of x86_pmu_start() and into x86_pmu_enable(). This
> broke any path that calls pmu->start() without going through
> x86_pmu_enable() -- specifically the unthrottle path:
>
> perf_adjust_freq_unthr_events()
> -> perf_event_unthrottle_group()
> -> perf_event_unthrottle()
> -> event->pmu->start(event, 0)
> -> x86_pmu_start() // sets active_mask but not events[]
>
> The race sequence is:
>
> 1. A group of perf events overflows, triggering group throttle via
> perf_event_throttle_group(). All events are stopped: active_mask
> bits cleared, events[] preserved (x86_pmu_stop no longer clears
> events[] after commit 7e772a93eb61).
>
> 2. While still throttled (PERF_HES_STOPPED), x86_pmu_enable() runs
> due to other scheduling activity. Stopped events that need to
> move counters get PERF_HES_ARCH set and events[old_idx] cleared.
> In step 2 of x86_pmu_enable(), PERF_HES_ARCH causes these events
> to be skipped -- events[new_idx] is never set.
So why not just move this then? Having less sites that set that value is
more better, no?
---
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 03ce1bc7ef2e..54b4c315d927 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1372,6 +1372,8 @@ static void x86_pmu_enable(struct pmu *pmu)
else if (i < n_running)
continue;
+ cpuc->events[hwc->idx] = event;
+
if (hwc->state & PERF_HES_ARCH)
continue;
@@ -1379,7 +1381,6 @@ static void x86_pmu_enable(struct pmu *pmu)
* if cpuc->enabled = 0, then no wrmsr as
* per x86_pmu_enable_event()
*/
- cpuc->events[hwc->idx] = event;
x86_pmu_start(event, PERF_EF_RELOAD);
}
cpuc->n_added = 0;
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] perf/x86: Restore event pointer setup in x86_pmu_start()
2026-03-09 16:38 ` Peter Zijlstra
@ 2026-03-09 17:00 ` Breno Leitao
0 siblings, 0 replies; 4+ messages in thread
From: Breno Leitao @ 2026-03-09 17:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Dapeng Mi, linux-perf-users, linux-kernel,
kernel-team
On Mon, Mar 09, 2026 at 05:38:47PM +0100, Peter Zijlstra wrote:
> On Mon, Mar 09, 2026 at 07:40:56AM -0700, Breno Leitao wrote:
> So why not just move this then? Having less sites that set that value is
> more better, no?
>
Sure, let me update.
Thanks for the review,
--breno
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] perf/x86: Restore event pointer setup in x86_pmu_start()
2026-03-09 14:40 [PATCH] perf/x86: Restore event pointer setup in x86_pmu_start() Breno Leitao
2026-03-09 16:38 ` Peter Zijlstra
@ 2026-03-10 1:45 ` Mi, Dapeng
1 sibling, 0 replies; 4+ messages in thread
From: Mi, Dapeng @ 2026-03-10 1:45 UTC (permalink / raw)
To: Breno Leitao, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin
Cc: linux-perf-users, linux-kernel, kernel-team
On 3/9/2026 10:40 PM, Breno Leitao wrote:
> A production AMD EPYC system crashed with a NULL pointer dereference
> in the PMU NMI handler:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000198
> RIP: x86_perf_event_update+0xc/0xa0
> Call Trace:
> <NMI>
> amd_pmu_v2_handle_irq+0x1a6/0x390
> perf_event_nmi_handler+0x24/0x40
>
> The faulting instruction is `cmpq $0x0, 0x198(%rdi)` with RDI=0,
> corresponding to the `if (unlikely(!hwc->event_base))` check in
> x86_perf_event_update() where hwc = &event->hw and event is NULL.
>
> drgn inspection of the vmcore on CPU 106 showed a mismatch between
> cpuc->active_mask and cpuc->events[]:
>
> active_mask: 0x1e (bits 1, 2, 3, 4)
> events[1]: 0xff1100136cbd4f38 (valid)
> events[2]: 0x0 (NULL, but active_mask bit 2 set)
> events[3]: 0xff1100076fd2cf38 (valid)
> events[4]: 0xff1100079e990a90 (valid)
>
> The event that should occupy events[2] was found in event_list[2]
> with hw.idx=2 and hw.state=0x0, confirming x86_pmu_start() had run
> (which clears hw.state and sets active_mask) but events[2] was
> never populated.
>
> Another event (event_list[0]) had hw.state=0x7 (STOPPED|UPTODATE|ARCH),
> showing it was stopped when the PMU rescheduled events, confirming the
> throttle-then-reschedule sequence occurred.
>
> The root cause is commit 7e772a93eb61 ("perf/x86: Fix NULL event access
> and potential PEBS record loss") which moved the cpuc->events[idx]
> assignment out of x86_pmu_start() and into x86_pmu_enable(). This
> broke any path that calls pmu->start() without going through
> x86_pmu_enable() -- specifically the unthrottle path:
>
> perf_adjust_freq_unthr_events()
> -> perf_event_unthrottle_group()
> -> perf_event_unthrottle()
> -> event->pmu->start(event, 0)
> -> x86_pmu_start() // sets active_mask but not events[]
>
> The race sequence is:
>
> 1. A group of perf events overflows, triggering group throttle via
> perf_event_throttle_group(). All events are stopped: active_mask
> bits cleared, events[] preserved (x86_pmu_stop no longer clears
> events[] after commit 7e772a93eb61).
>
> 2. While still throttled (PERF_HES_STOPPED), x86_pmu_enable() runs
> due to other scheduling activity. Stopped events that need to
> move counters get PERF_HES_ARCH set and events[old_idx] cleared.
> In step 2 of x86_pmu_enable(), PERF_HES_ARCH causes these events
> to be skipped -- events[new_idx] is never set.
>
> 3. The timer tick unthrottles the group via pmu->start(). Since
> commit 7e772a93eb61 removed the events[] assignment from
> x86_pmu_start(), active_mask[new_idx] is set but events[new_idx]
> remains NULL.
>
> 4. A PMC overflow NMI fires. The handler iterates active counters,
> finds active_mask[2] set, reads events[2] which is NULL, and
> crashes dereferencing it.
Thanks for fixing this issue. Better add an "Cc: stable@vger.kernel.org"
tag as well.
>
> Restore cpuc->events[idx] = event in x86_pmu_start() so that every
> caller of pmu->start() correctly populates events[] before setting
> active_mask. This does not reintroduce the PEBS issue that commit
> 7e772a93eb61 fixed, because that fix also moved the events[] = NULL
> clearing from x86_pmu_stop() to x86_pmu_del() -- throttle/unthrottle
> cycles no longer clear events[].
>
> Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> arch/x86/events/core.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 03ce1bc7ef2ea..fd82d1427b335 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1546,6 +1546,11 @@ static void x86_pmu_start(struct perf_event *event, int flags)
>
> event->hw.state = 0;
>
> + /*
> + * Ensure events[idx] is set before active_mask, so NMI handlers
> + * never see an active counter with a NULL event pointer.
> + */
> + cpuc->events[idx] = event;
> __set_bit(idx, cpuc->active_mask);
> static_call(x86_pmu_enable)(event);
> perf_event_update_userpage(event);
>
> ---
> base-commit: 0bcac7b11262557c990da1ac564d45777eb6b005
> change-id: 20260309-perf-fd32da0317a8
>
> Best regards,
> --
> Breno Leitao <leitao@debian.org>
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-10 1:45 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 14:40 [PATCH] perf/x86: Restore event pointer setup in x86_pmu_start() Breno Leitao
2026-03-09 16:38 ` Peter Zijlstra
2026-03-09 17:00 ` Breno Leitao
2026-03-10 1:45 ` Mi, Dapeng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox