public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
@ 2026-03-10 10:13 Breno Leitao
  2026-03-11  2:04 ` Mi, Dapeng
  2026-03-16  9:50 ` [tip: perf/urgent] " tip-bot2 for Breno Leitao
  0 siblings, 2 replies; 13+ messages in thread
From: Breno Leitao @ 2026-03-10 10:13 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Thomas Gleixner,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Dapeng Mi
  Cc: linux-perf-users, linux-kernel, kernel-team, Breno Leitao, stable

A production AMD EPYC system crashed with a NULL pointer dereference
in the PMU NMI handler:

  BUG: kernel NULL pointer dereference, address: 0000000000000198
  RIP: x86_perf_event_update+0xc/0xa0
  Call Trace:
   <NMI>
   amd_pmu_v2_handle_irq+0x1a6/0x390
   perf_event_nmi_handler+0x24/0x40

The faulting instruction is `cmpq $0x0, 0x198(%rdi)` with RDI=0,
corresponding to the `if (unlikely(!hwc->event_base))` check in
x86_perf_event_update() where hwc = &event->hw and event is NULL.

drgn inspection of the vmcore on CPU 106 showed a mismatch between
cpuc->active_mask and cpuc->events[]:

  active_mask: 0x1e (bits 1, 2, 3, 4)
  events[1]:   0xff1100136cbd4f38  (valid)
  events[2]:   0x0                 (NULL, but active_mask bit 2 set)
  events[3]:   0xff1100076fd2cf38  (valid)
  events[4]:   0xff1100079e990a90  (valid)

The event that should occupy events[2] was found in event_list[2]
with hw.idx=2 and hw.state=0x0, confirming x86_pmu_start() had run
(which clears hw.state and sets active_mask) but events[2] was
never populated.

Another event (event_list[0]) had hw.state=0x7 (STOPPED|UPTODATE|ARCH),
showing it was stopped when the PMU rescheduled events, confirming the
throttle-then-reschedule sequence occurred.

The root cause is commit 7e772a93eb61 ("perf/x86: Fix NULL event access
and potential PEBS record loss") which moved the cpuc->events[idx]
assignment out of x86_pmu_start() and into step 2 of x86_pmu_enable(),
after the PERF_HES_ARCH check. This broke any path that calls
pmu->start() without going through x86_pmu_enable() -- specifically
the unthrottle path:

  perf_adjust_freq_unthr_events()
    -> perf_event_unthrottle_group()
      -> perf_event_unthrottle()
        -> event->pmu->start(event, 0)
          -> x86_pmu_start()     // sets active_mask but not events[]

The race sequence is:

  1. A group of perf events overflows, triggering group throttle via
     perf_event_throttle_group(). All events are stopped: active_mask
     bits cleared, events[] preserved (x86_pmu_stop no longer clears
     events[] after commit 7e772a93eb61).

  2. While still throttled (PERF_HES_STOPPED), x86_pmu_enable() runs
     due to other scheduling activity. Stopped events that need to
     move counters get PERF_HES_ARCH set and events[old_idx] cleared.
     In step 2 of x86_pmu_enable(), PERF_HES_ARCH causes these events
     to be skipped -- events[new_idx] is never set.

  3. The timer tick unthrottles the group via pmu->start(). Since
     commit 7e772a93eb61 removed the events[] assignment from
     x86_pmu_start(), active_mask[new_idx] is set but events[new_idx]
     remains NULL.

  4. A PMC overflow NMI fires. The handler iterates active counters,
     finds active_mask[2] set, reads events[2] which is NULL, and
     crashes dereferencing it.

Move the cpuc->events[hwc->idx] assignment in x86_pmu_enable() to
before the PERF_HES_ARCH check, so that events[] is populated even
for events that are not immediately started. This ensures the
unthrottle path via pmu->start() always finds a valid event pointer.

Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
Signed-off-by: Breno Leitao <leitao@debian.org>
Cc: stable@vger.kernel.org
---
Changes in v2:
- Move event pointer setup earlier in x86_pmu_enable() (peterz)
- Rewrote the patch title, given the new approach
- Link to v1: https://patch.msgid.link/20260309-perf-v1-1-601ffb531893@debian.org
---
 arch/x86/events/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 03ce1bc7ef2ea..54b4c315d927f 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1372,6 +1372,8 @@ static void x86_pmu_enable(struct pmu *pmu)
 			else if (i < n_running)
 				continue;
 
+			cpuc->events[hwc->idx] = event;
+
 			if (hwc->state & PERF_HES_ARCH)
 				continue;
 
@@ -1379,7 +1381,6 @@ static void x86_pmu_enable(struct pmu *pmu)
 			 * if cpuc->enabled = 0, then no wrmsr as
 			 * per x86_pmu_enable_event()
 			 */
-			cpuc->events[hwc->idx] = event;
 			x86_pmu_start(event, PERF_EF_RELOAD);
 		}
 		cpuc->n_added = 0;

---
base-commit: 0bcac7b11262557c990da1ac564d45777eb6b005
change-id: 20260309-perf-fd32da0317a8

Best regards,
--  
Breno Leitao <leitao@debian.org>


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
  2026-03-10 10:13 [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable() Breno Leitao
@ 2026-03-11  2:04 ` Mi, Dapeng
  2026-03-11 16:37   ` Ian Rogers
  2026-03-11 17:18   ` Peter Zijlstra
  2026-03-16  9:50 ` [tip: perf/urgent] " tip-bot2 for Breno Leitao
  1 sibling, 2 replies; 13+ messages in thread
From: Mi, Dapeng @ 2026-03-11  2:04 UTC (permalink / raw)
  To: Breno Leitao, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin
  Cc: linux-perf-users, linux-kernel, kernel-team, stable


On 3/10/2026 6:13 PM, Breno Leitao wrote:
> A production AMD EPYC system crashed with a NULL pointer dereference
> in the PMU NMI handler:
>
>   BUG: kernel NULL pointer dereference, address: 0000000000000198
>   RIP: x86_perf_event_update+0xc/0xa0
>   Call Trace:
>    <NMI>
>    amd_pmu_v2_handle_irq+0x1a6/0x390
>    perf_event_nmi_handler+0x24/0x40
>
> The faulting instruction is `cmpq $0x0, 0x198(%rdi)` with RDI=0,
> corresponding to the `if (unlikely(!hwc->event_base))` check in
> x86_perf_event_update() where hwc = &event->hw and event is NULL.
>
> drgn inspection of the vmcore on CPU 106 showed a mismatch between
> cpuc->active_mask and cpuc->events[]:
>
>   active_mask: 0x1e (bits 1, 2, 3, 4)
>   events[1]:   0xff1100136cbd4f38  (valid)
>   events[2]:   0x0                 (NULL, but active_mask bit 2 set)
>   events[3]:   0xff1100076fd2cf38  (valid)
>   events[4]:   0xff1100079e990a90  (valid)
>
> The event that should occupy events[2] was found in event_list[2]
> with hw.idx=2 and hw.state=0x0, confirming x86_pmu_start() had run
> (which clears hw.state and sets active_mask) but events[2] was
> never populated.
>
> Another event (event_list[0]) had hw.state=0x7 (STOPPED|UPTODATE|ARCH),
> showing it was stopped when the PMU rescheduled events, confirming the
> throttle-then-reschedule sequence occurred.
>
> The root cause is commit 7e772a93eb61 ("perf/x86: Fix NULL event access
> and potential PEBS record loss") which moved the cpuc->events[idx]
> assignment out of x86_pmu_start() and into step 2 of x86_pmu_enable(),
> after the PERF_HES_ARCH check. This broke any path that calls
> pmu->start() without going through x86_pmu_enable() -- specifically
> the unthrottle path:
>
>   perf_adjust_freq_unthr_events()
>     -> perf_event_unthrottle_group()
>       -> perf_event_unthrottle()
>         -> event->pmu->start(event, 0)
>           -> x86_pmu_start()     // sets active_mask but not events[]
>
> The race sequence is:
>
>   1. A group of perf events overflows, triggering group throttle via
>      perf_event_throttle_group(). All events are stopped: active_mask
>      bits cleared, events[] preserved (x86_pmu_stop no longer clears
>      events[] after commit 7e772a93eb61).
>
>   2. While still throttled (PERF_HES_STOPPED), x86_pmu_enable() runs
>      due to other scheduling activity. Stopped events that need to
>      move counters get PERF_HES_ARCH set and events[old_idx] cleared.
>      In step 2 of x86_pmu_enable(), PERF_HES_ARCH causes these events
>      to be skipped -- events[new_idx] is never set.
>
>   3. The timer tick unthrottles the group via pmu->start(). Since
>      commit 7e772a93eb61 removed the events[] assignment from
>      x86_pmu_start(), active_mask[new_idx] is set but events[new_idx]
>      remains NULL.
>
>   4. A PMC overflow NMI fires. The handler iterates active counters,
>      finds active_mask[2] set, reads events[2] which is NULL, and
>      crashes dereferencing it.
>
> Move the cpuc->events[hwc->idx] assignment in x86_pmu_enable() to
> before the PERF_HES_ARCH check, so that events[] is populated even
> for events that are not immediately started. This ensures the
> unthrottle path via pmu->start() always finds a valid event pointer.
>
> Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Cc: stable@vger.kernel.org
> ---
> Changes in v2:
> - Move event pointer setup earlier in x86_pmu_enable() (peterz)
> - Rewrote the patch title, given the new approach
> - Link to v1: https://patch.msgid.link/20260309-perf-v1-1-601ffb531893@debian.org
> ---
>  arch/x86/events/core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 03ce1bc7ef2ea..54b4c315d927f 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1372,6 +1372,8 @@ static void x86_pmu_enable(struct pmu *pmu)
>  			else if (i < n_running)
>  				continue;
>  
> +			cpuc->events[hwc->idx] = event;
> +
>  			if (hwc->state & PERF_HES_ARCH)
>  				continue;
>  
> @@ -1379,7 +1381,6 @@ static void x86_pmu_enable(struct pmu *pmu)
>  			 * if cpuc->enabled = 0, then no wrmsr as
>  			 * per x86_pmu_enable_event()
>  			 */
> -			cpuc->events[hwc->idx] = event;
>  			x86_pmu_start(event, PERF_EF_RELOAD);
>  		}
>  		cpuc->n_added = 0;

Just think twice, it seems the change could slightly break the logic of
current PEBS counter snapshot logic. 

Currently the function intel_perf_event_update_pmc() needs to filter out
these uninitialized counter by checking if the event is NULL as below
comments and code show.

```

     * - An event is stopped for some reason, e.g., throttled.
     *   During this period, another event is added and takes the
     *   counter of the stopped event. The stopped event is assigned
     *   to another new and uninitialized counter, since the
     *   x86_pmu_start(RELOAD) is not invoked for a stopped event.
     *   The PEBS__DATA_CFG is updated regardless of the event state.
     *   The uninitialized counter can be recorded in a PEBS record.
     *   But the cpuc->events[uninitialized_counter] is always NULL,
     *   because the event is stopped. The uninitialized value is
     *   safely dropped.
     */
    if (!event)
        return;

```

Once we have this change, then the original index of a stopped event could
be assigned to a new event. In these case, although the new event is still
not activated, the cpuc->events[original_index] has been initialized and
won't be NULL. So intel_perf_event_update_pmc() could update the cached
count value to wrong event.

I suppose we have two ways to fix this issue.

1. Move "cpuc->events[idx] = event" into x86_pmu_start(), just like what
the v1 patch does.

2. Check cpuc->active_mask in intel_perf_event_update_pmc() as well, but
the side effect is that the cached counter snapshots for the stopped events
have to be dropped and it has no chance to update the count value for these
stopped events even though the HW index of these stopped events are not
occupied by other new events.

Peter, how's your idea on this? Thanks.


>
> ---
> base-commit: 0bcac7b11262557c990da1ac564d45777eb6b005
> change-id: 20260309-perf-fd32da0317a8
>
> Best regards,
> --  
> Breno Leitao <leitao@debian.org>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
  2026-03-11  2:04 ` Mi, Dapeng
@ 2026-03-11 16:37   ` Ian Rogers
  2026-03-11 17:35     ` Peter Zijlstra
  2026-03-11 17:18   ` Peter Zijlstra
  1 sibling, 1 reply; 13+ messages in thread
From: Ian Rogers @ 2026-03-11 16:37 UTC (permalink / raw)
  To: Mi, Dapeng
  Cc: Breno Leitao, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-perf-users, linux-kernel, kernel-team,
	stable

On Tue, Mar 10, 2026 at 7:04 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>
>
> On 3/10/2026 6:13 PM, Breno Leitao wrote:
> > A production AMD EPYC system crashed with a NULL pointer dereference
> > in the PMU NMI handler:
> >
> >   BUG: kernel NULL pointer dereference, address: 0000000000000198
> >   RIP: x86_perf_event_update+0xc/0xa0
> >   Call Trace:
> >    <NMI>
> >    amd_pmu_v2_handle_irq+0x1a6/0x390
> >    perf_event_nmi_handler+0x24/0x40
> >
> > The faulting instruction is `cmpq $0x0, 0x198(%rdi)` with RDI=0,
> > corresponding to the `if (unlikely(!hwc->event_base))` check in
> > x86_perf_event_update() where hwc = &event->hw and event is NULL.
> >
> > drgn inspection of the vmcore on CPU 106 showed a mismatch between
> > cpuc->active_mask and cpuc->events[]:
> >
> >   active_mask: 0x1e (bits 1, 2, 3, 4)
> >   events[1]:   0xff1100136cbd4f38  (valid)
> >   events[2]:   0x0                 (NULL, but active_mask bit 2 set)
> >   events[3]:   0xff1100076fd2cf38  (valid)
> >   events[4]:   0xff1100079e990a90  (valid)
> >
> > The event that should occupy events[2] was found in event_list[2]
> > with hw.idx=2 and hw.state=0x0, confirming x86_pmu_start() had run
> > (which clears hw.state and sets active_mask) but events[2] was
> > never populated.
> >
> > Another event (event_list[0]) had hw.state=0x7 (STOPPED|UPTODATE|ARCH),
> > showing it was stopped when the PMU rescheduled events, confirming the
> > throttle-then-reschedule sequence occurred.
> >
> > The root cause is commit 7e772a93eb61 ("perf/x86: Fix NULL event access
> > and potential PEBS record loss") which moved the cpuc->events[idx]
> > assignment out of x86_pmu_start() and into step 2 of x86_pmu_enable(),
> > after the PERF_HES_ARCH check. This broke any path that calls
> > pmu->start() without going through x86_pmu_enable() -- specifically
> > the unthrottle path:
> >
> >   perf_adjust_freq_unthr_events()
> >     -> perf_event_unthrottle_group()
> >       -> perf_event_unthrottle()
> >         -> event->pmu->start(event, 0)
> >           -> x86_pmu_start()     // sets active_mask but not events[]
> >
> > The race sequence is:
> >
> >   1. A group of perf events overflows, triggering group throttle via
> >      perf_event_throttle_group(). All events are stopped: active_mask
> >      bits cleared, events[] preserved (x86_pmu_stop no longer clears
> >      events[] after commit 7e772a93eb61).
> >
> >   2. While still throttled (PERF_HES_STOPPED), x86_pmu_enable() runs
> >      due to other scheduling activity. Stopped events that need to
> >      move counters get PERF_HES_ARCH set and events[old_idx] cleared.
> >      In step 2 of x86_pmu_enable(), PERF_HES_ARCH causes these events
> >      to be skipped -- events[new_idx] is never set.
> >
> >   3. The timer tick unthrottles the group via pmu->start(). Since
> >      commit 7e772a93eb61 removed the events[] assignment from
> >      x86_pmu_start(), active_mask[new_idx] is set but events[new_idx]
> >      remains NULL.
> >
> >   4. A PMC overflow NMI fires. The handler iterates active counters,
> >      finds active_mask[2] set, reads events[2] which is NULL, and
> >      crashes dereferencing it.
> >
> > Move the cpuc->events[hwc->idx] assignment in x86_pmu_enable() to
> > before the PERF_HES_ARCH check, so that events[] is populated even
> > for events that are not immediately started. This ensures the
> > unthrottle path via pmu->start() always finds a valid event pointer.
> >
> > Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > Cc: stable@vger.kernel.org
> > ---
> > Changes in v2:
> > - Move event pointer setup earlier in x86_pmu_enable() (peterz)
> > - Rewrote the patch title, given the new approach
> > - Link to v1: https://patch.msgid.link/20260309-perf-v1-1-601ffb531893@debian.org
> > ---
> >  arch/x86/events/core.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > index 03ce1bc7ef2ea..54b4c315d927f 100644
> > --- a/arch/x86/events/core.c
> > +++ b/arch/x86/events/core.c
> > @@ -1372,6 +1372,8 @@ static void x86_pmu_enable(struct pmu *pmu)
> >                       else if (i < n_running)
> >                               continue;
> >
> > +                     cpuc->events[hwc->idx] = event;
> > +
> >                       if (hwc->state & PERF_HES_ARCH)
> >                               continue;
> >
> > @@ -1379,7 +1381,6 @@ static void x86_pmu_enable(struct pmu *pmu)
> >                        * if cpuc->enabled = 0, then no wrmsr as
> >                        * per x86_pmu_enable_event()
> >                        */
> > -                     cpuc->events[hwc->idx] = event;
> >                       x86_pmu_start(event, PERF_EF_RELOAD);
> >               }
> >               cpuc->n_added = 0;
>
> Just think twice, it seems the change could slightly break the logic of
> current PEBS counter snapshot logic.
>
> Currently the function intel_perf_event_update_pmc() needs to filter out
> these uninitialized counter by checking if the event is NULL as below
> comments and code show.
>
> ```
>
>      * - An event is stopped for some reason, e.g., throttled.
>      *   During this period, another event is added and takes the
>      *   counter of the stopped event. The stopped event is assigned
>      *   to another new and uninitialized counter, since the
>      *   x86_pmu_start(RELOAD) is not invoked for a stopped event.
>      *   The PEBS__DATA_CFG is updated regardless of the event state.
>      *   The uninitialized counter can be recorded in a PEBS record.
>      *   But the cpuc->events[uninitialized_counter] is always NULL,
>      *   because the event is stopped. The uninitialized value is
>      *   safely dropped.
>      */
>     if (!event)
>         return;
>
> ```
>
> Once we have this change, then the original index of a stopped event could
> be assigned to a new event. In these case, although the new event is still
> not activated, the cpuc->events[original_index] has been initialized and
> won't be NULL. So intel_perf_event_update_pmc() could update the cached
> count value to wrong event.
>
> I suppose we have two ways to fix this issue.
>
> 1. Move "cpuc->events[idx] = event" into x86_pmu_start(), just like what
> the v1 patch does.
>
> 2. Check cpuc->active_mask in intel_perf_event_update_pmc() as well, but
> the side effect is that the cached counter snapshots for the stopped events
> have to be dropped and it has no chance to update the count value for these
> stopped events even though the HW index of these stopped events are not
> occupied by other new events.
>
> Peter, how's your idea on this? Thanks.

Fwiw, an AI review was saying something similar to me. I wonder if the
issue with NMI storms exists already via another path:

By populating cpuc->events[idx] here, even for events that skip
x86_pmu_start() due to the PERF_HES_ARCH check, can this lead to PEBS
data corruption?

For Intel PEBS, intel_pmu_pebs_late_setup() iterates over cpuc->event_list
and enables PEBS_DATA_CFG for this counter index regardless of its stopped
state. If the PMU hardware generates PEBS records for this uninitialized
counter, or if there are leftover PEBS records from the counter's previous
occupant (since x86_pmu_stop() does not drain the PEBS buffer),
intel_perf_event_update_pmc() will now find a non-NULL event pointer.
Will it incorrectly process these leftover records and attribute them
to the stopped event?

Additionally, does this change leave the unthrottled event's hardware
counter uninitialized?

When x86_pmu_enable() moves a throttled event to a new counter, it sets
PERF_HES_ARCH and skips calling x86_pmu_start(event, PERF_EF_RELOAD).
Later, when the timer tick unthrottles the event, it calls
perf_event_unthrottle(), which invokes event->pmu->start(event, 0).
In x86_pmu_start(), because flags == 0 (lacking PERF_EF_RELOAD),
x86_pmu_set_period() is skipped. Will the newly assigned hardware counter
be enabled without being programmed with the event's period, potentially
causing it to start from a garbage value and leading to incorrect sampling
intervals or NMI storms?

Thanks,
Ian


> >
> > ---
> > base-commit: 0bcac7b11262557c990da1ac564d45777eb6b005
> > change-id: 20260309-perf-fd32da0317a8
> >
> > Best regards,
> > --
> > Breno Leitao <leitao@debian.org>
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
  2026-03-11  2:04 ` Mi, Dapeng
  2026-03-11 16:37   ` Ian Rogers
@ 2026-03-11 17:18   ` Peter Zijlstra
  2026-03-12  1:05     ` Mi, Dapeng
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2026-03-11 17:18 UTC (permalink / raw)
  To: Mi, Dapeng
  Cc: Breno Leitao, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Thomas Gleixner, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, linux-perf-users, linux-kernel,
	kernel-team, stable

On Wed, Mar 11, 2026 at 10:04:10AM +0800, Mi, Dapeng wrote:
> 
> On 3/10/2026 6:13 PM, Breno Leitao wrote:
> > A production AMD EPYC system crashed with a NULL pointer dereference
> > in the PMU NMI handler:
> >
> >   BUG: kernel NULL pointer dereference, address: 0000000000000198
> >   RIP: x86_perf_event_update+0xc/0xa0
> >   Call Trace:
> >    <NMI>
> >    amd_pmu_v2_handle_irq+0x1a6/0x390
> >    perf_event_nmi_handler+0x24/0x40
> >
> > The faulting instruction is `cmpq $0x0, 0x198(%rdi)` with RDI=0,
> > corresponding to the `if (unlikely(!hwc->event_base))` check in
> > x86_perf_event_update() where hwc = &event->hw and event is NULL.
> >
> > drgn inspection of the vmcore on CPU 106 showed a mismatch between
> > cpuc->active_mask and cpuc->events[]:
> >
> >   active_mask: 0x1e (bits 1, 2, 3, 4)
> >   events[1]:   0xff1100136cbd4f38  (valid)
> >   events[2]:   0x0                 (NULL, but active_mask bit 2 set)
> >   events[3]:   0xff1100076fd2cf38  (valid)
> >   events[4]:   0xff1100079e990a90  (valid)
> >
> > The event that should occupy events[2] was found in event_list[2]
> > with hw.idx=2 and hw.state=0x0, confirming x86_pmu_start() had run
> > (which clears hw.state and sets active_mask) but events[2] was
> > never populated.
> >
> > Another event (event_list[0]) had hw.state=0x7 (STOPPED|UPTODATE|ARCH),
> > showing it was stopped when the PMU rescheduled events, confirming the
> > throttle-then-reschedule sequence occurred.
> >
> > The root cause is commit 7e772a93eb61 ("perf/x86: Fix NULL event access
> > and potential PEBS record loss") which moved the cpuc->events[idx]
> > assignment out of x86_pmu_start() and into step 2 of x86_pmu_enable(),
> > after the PERF_HES_ARCH check. This broke any path that calls
> > pmu->start() without going through x86_pmu_enable() -- specifically
> > the unthrottle path:
> >
> >   perf_adjust_freq_unthr_events()
> >     -> perf_event_unthrottle_group()
> >       -> perf_event_unthrottle()
> >         -> event->pmu->start(event, 0)
> >           -> x86_pmu_start()     // sets active_mask but not events[]
> >
> > The race sequence is:
> >
> >   1. A group of perf events overflows, triggering group throttle via
> >      perf_event_throttle_group(). All events are stopped: active_mask
> >      bits cleared, events[] preserved (x86_pmu_stop no longer clears
> >      events[] after commit 7e772a93eb61).
> >
> >   2. While still throttled (PERF_HES_STOPPED), x86_pmu_enable() runs
> >      due to other scheduling activity. Stopped events that need to
> >      move counters get PERF_HES_ARCH set and events[old_idx] cleared.
> >      In step 2 of x86_pmu_enable(), PERF_HES_ARCH causes these events
> >      to be skipped -- events[new_idx] is never set.
> >
> >   3. The timer tick unthrottles the group via pmu->start(). Since
> >      commit 7e772a93eb61 removed the events[] assignment from
> >      x86_pmu_start(), active_mask[new_idx] is set but events[new_idx]
> >      remains NULL.
> >
> >   4. A PMC overflow NMI fires. The handler iterates active counters,
> >      finds active_mask[2] set, reads events[2] which is NULL, and
> >      crashes dereferencing it.
> >
> > Move the cpuc->events[hwc->idx] assignment in x86_pmu_enable() to
> > before the PERF_HES_ARCH check, so that events[] is populated even
> > for events that are not immediately started. This ensures the
> > unthrottle path via pmu->start() always finds a valid event pointer.
> >
> > Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > Cc: stable@vger.kernel.org
> > ---
> > Changes in v2:
> > - Move event pointer setup earlier in x86_pmu_enable() (peterz)
> > - Rewrote the patch title, given the new approach
> > - Link to v1: https://patch.msgid.link/20260309-perf-v1-1-601ffb531893@debian.org
> > ---
> >  arch/x86/events/core.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > index 03ce1bc7ef2ea..54b4c315d927f 100644
> > --- a/arch/x86/events/core.c
> > +++ b/arch/x86/events/core.c
> > @@ -1372,6 +1372,8 @@ static void x86_pmu_enable(struct pmu *pmu)
> >  			else if (i < n_running)
> >  				continue;
> >  
> > +			cpuc->events[hwc->idx] = event;
> > +
> >  			if (hwc->state & PERF_HES_ARCH)
> >  				continue;
> >  
> > @@ -1379,7 +1381,6 @@ static void x86_pmu_enable(struct pmu *pmu)
> >  			 * if cpuc->enabled = 0, then no wrmsr as
> >  			 * per x86_pmu_enable_event()
> >  			 */
> > -			cpuc->events[hwc->idx] = event;
> >  			x86_pmu_start(event, PERF_EF_RELOAD);
> >  		}
> >  		cpuc->n_added = 0;
> 
> Just think twice, it seems the change could slightly break the logic of
> current PEBS counter snapshot logic. 
> 
> Currently the function intel_perf_event_update_pmc() needs to filter out
> these uninitialized counter by checking if the event is NULL as below
> comments and code show.
> 
> ```
> 
>      * - An event is stopped for some reason, e.g., throttled.
>      *   During this period, another event is added and takes the
>      *   counter of the stopped event. The stopped event is assigned
>      *   to another new and uninitialized counter, since the
>      *   x86_pmu_start(RELOAD) is not invoked for a stopped event.
>      *   The PEBS__DATA_CFG is updated regardless of the event state.
>      *   The uninitialized counter can be recorded in a PEBS record.
>      *   But the cpuc->events[uninitialized_counter] is always NULL,
>      *   because the event is stopped. The uninitialized value is
>      *   safely dropped.
>      */
>     if (!event)
>         return;
> 
> ```
> 
> Once we have this change, then the original index of a stopped event could
> be assigned to a new event. In these case, although the new event is still
> not activated, the cpuc->events[original_index] has been initialized and
> won't be NULL. So intel_perf_event_update_pmc() could update the cached
> count value to wrong event.
> 
> I suppose we have two ways to fix this issue.
> 
> 1. Move "cpuc->events[idx] = event" into x86_pmu_start(), just like what
> the v1 patch does.

That's not what v1 did; v1 did an additional setting.

> 2. Check cpuc->active_mask in intel_perf_event_update_pmc() as well, but
> the side effect is that the cached counter snapshots for the stopped events
> have to be dropped and it has no chance to update the count value for these
> stopped events even though the HW index of these stopped events are not
> occupied by other new events.
> 
> Peter, how's your idea on this? Thanks.

So you're saying that intel_perf_event_update_pmc() will be trying to
read the hardware counter; which hasn't been written with a sensible
value (and thus mis-behave) even though the event is STOPPED and the
active_mask bit is unset?

I'm thinking intel_perf_event_update_pmc() needs help either way around
:-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
  2026-03-11 16:37   ` Ian Rogers
@ 2026-03-11 17:35     ` Peter Zijlstra
  2026-03-11 20:40       ` Peter Zijlstra
  2026-03-12  1:46       ` Mi, Dapeng
  0 siblings, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2026-03-11 17:35 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Mi, Dapeng, Breno Leitao, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, James Clark, Thomas Gleixner, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, linux-perf-users, linux-kernel,
	kernel-team, stable

On Wed, Mar 11, 2026 at 09:37:08AM -0700, Ian Rogers wrote:

> Fwiw, an AI review was saying something similar to me. 

How do you guys get this AI stuff to say anything sensible at all about
the code? Every time I try, it is telling me the most abject nonsense.
I've wasted hours of my life trying to guide it to known bugs, in vain.

> I wonder if the
> issue with NMI storms exists already via another path:
> 
> By populating cpuc->events[idx] here, even for events that skip
> x86_pmu_start() due to the PERF_HES_ARCH check, can this lead to PEBS
> data corruption?
> 
> For Intel PEBS, intel_pmu_pebs_late_setup() iterates over cpuc->event_list
> and enables PEBS_DATA_CFG for this counter index regardless of its stopped
> state. If the PMU hardware generates PEBS records for this uninitialized
> counter, or if there are leftover PEBS records from the counter's previous
> occupant (since x86_pmu_stop() does not drain the PEBS buffer),
> intel_perf_event_update_pmc() will now find a non-NULL event pointer.
> Will it incorrectly process these leftover records and attribute them
> to the stopped event?

Yes.

> Additionally, does this change leave the unthrottled event's hardware
> counter uninitialized?

Also yes.

> When x86_pmu_enable() moves a throttled event to a new counter, it sets
> PERF_HES_ARCH and skips calling x86_pmu_start(event, PERF_EF_RELOAD).
> Later, when the timer tick unthrottles the event, it calls
> perf_event_unthrottle(), which invokes event->pmu->start(event, 0).
> In x86_pmu_start(), because flags == 0 (lacking PERF_EF_RELOAD),
> x86_pmu_set_period() is skipped. Will the newly assigned hardware counter
> be enabled without being programmed with the event's period, potentially
> causing it to start from a garbage value and leading to incorrect sampling
> intervals or NMI storms?

Don't think you can get NMI storms this way. If the PMI trips, we'll do
set_period and that should fix it up.

So just wrong counting.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
  2026-03-11 17:35     ` Peter Zijlstra
@ 2026-03-11 20:40       ` Peter Zijlstra
  2026-03-12  2:53         ` Mi, Dapeng
  2026-03-12  1:46       ` Mi, Dapeng
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2026-03-11 20:40 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Mi, Dapeng, Breno Leitao, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, James Clark, Thomas Gleixner, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, linux-perf-users, linux-kernel,
	kernel-team, stable

On Wed, Mar 11, 2026 at 06:35:09PM +0100, Peter Zijlstra wrote:
> > Additionally, does this change leave the unthrottled event's hardware
> > counter uninitialized?
> 
> Also yes.

Something like so on top of things I suppose.

---
Subject: x86/perf: Make sure to program the counter value for stopped events on migration
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Mar 11 21:29:14 CET 2026

Both Mi Dapeng and Ian Rogers noted that not everything that sets HES_STOPPED
is required to EF_UPDATE. Specifically the 'step 1' loop of rescheduling
explicitly does EF_UPDATE to ensure the counter value is read.

However, then 'step 2' simply leaves the new counter uninitialized when
HES_STOPPED, even though, as noted above, the thing that stopped them might not
be aware it needs to EF_RELOAD -- since it didn't EF_UPDATE on stop.

One such location that is affected is throttling, throttle does pmu->stop(, 0);
and unthrottle does pmu->start(, 0); possibly restarting an uninitialized counter.

Fixes: a4eaf7f14675 ("perf: Rework the PMU methods")
Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reported-by: Ian Rogers <irogers@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/events/core.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1374,8 +1374,10 @@ static void x86_pmu_enable(struct pmu *p
 
 			cpuc->events[hwc->idx] = event;
 
-			if (hwc->state & PERF_HES_ARCH)
+			if (hwc->state & PERF_HES_ARCH) {
+				static_call(x86_pmu_set_period)(event);
 				continue;
+			}
 
 			/*
 			 * if cpuc->enabled = 0, then no wrmsr as

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
  2026-03-11 17:18   ` Peter Zijlstra
@ 2026-03-12  1:05     ` Mi, Dapeng
  0 siblings, 0 replies; 13+ messages in thread
From: Mi, Dapeng @ 2026-03-12  1:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Breno Leitao, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, James Clark, Thomas Gleixner, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, linux-perf-users, linux-kernel,
	kernel-team, stable


On 3/12/2026 1:18 AM, Peter Zijlstra wrote:
> On Wed, Mar 11, 2026 at 10:04:10AM +0800, Mi, Dapeng wrote:
>> On 3/10/2026 6:13 PM, Breno Leitao wrote:
>>> A production AMD EPYC system crashed with a NULL pointer dereference
>>> in the PMU NMI handler:
>>>
>>>   BUG: kernel NULL pointer dereference, address: 0000000000000198
>>>   RIP: x86_perf_event_update+0xc/0xa0
>>>   Call Trace:
>>>    <NMI>
>>>    amd_pmu_v2_handle_irq+0x1a6/0x390
>>>    perf_event_nmi_handler+0x24/0x40
>>>
>>> The faulting instruction is `cmpq $0x0, 0x198(%rdi)` with RDI=0,
>>> corresponding to the `if (unlikely(!hwc->event_base))` check in
>>> x86_perf_event_update() where hwc = &event->hw and event is NULL.
>>>
>>> drgn inspection of the vmcore on CPU 106 showed a mismatch between
>>> cpuc->active_mask and cpuc->events[]:
>>>
>>>   active_mask: 0x1e (bits 1, 2, 3, 4)
>>>   events[1]:   0xff1100136cbd4f38  (valid)
>>>   events[2]:   0x0                 (NULL, but active_mask bit 2 set)
>>>   events[3]:   0xff1100076fd2cf38  (valid)
>>>   events[4]:   0xff1100079e990a90  (valid)
>>>
>>> The event that should occupy events[2] was found in event_list[2]
>>> with hw.idx=2 and hw.state=0x0, confirming x86_pmu_start() had run
>>> (which clears hw.state and sets active_mask) but events[2] was
>>> never populated.
>>>
>>> Another event (event_list[0]) had hw.state=0x7 (STOPPED|UPTODATE|ARCH),
>>> showing it was stopped when the PMU rescheduled events, confirming the
>>> throttle-then-reschedule sequence occurred.
>>>
>>> The root cause is commit 7e772a93eb61 ("perf/x86: Fix NULL event access
>>> and potential PEBS record loss") which moved the cpuc->events[idx]
>>> assignment out of x86_pmu_start() and into step 2 of x86_pmu_enable(),
>>> after the PERF_HES_ARCH check. This broke any path that calls
>>> pmu->start() without going through x86_pmu_enable() -- specifically
>>> the unthrottle path:
>>>
>>>   perf_adjust_freq_unthr_events()
>>>     -> perf_event_unthrottle_group()
>>>       -> perf_event_unthrottle()
>>>         -> event->pmu->start(event, 0)
>>>           -> x86_pmu_start()     // sets active_mask but not events[]
>>>
>>> The race sequence is:
>>>
>>>   1. A group of perf events overflows, triggering group throttle via
>>>      perf_event_throttle_group(). All events are stopped: active_mask
>>>      bits cleared, events[] preserved (x86_pmu_stop no longer clears
>>>      events[] after commit 7e772a93eb61).
>>>
>>>   2. While still throttled (PERF_HES_STOPPED), x86_pmu_enable() runs
>>>      due to other scheduling activity. Stopped events that need to
>>>      move counters get PERF_HES_ARCH set and events[old_idx] cleared.
>>>      In step 2 of x86_pmu_enable(), PERF_HES_ARCH causes these events
>>>      to be skipped -- events[new_idx] is never set.
>>>
>>>   3. The timer tick unthrottles the group via pmu->start(). Since
>>>      commit 7e772a93eb61 removed the events[] assignment from
>>>      x86_pmu_start(), active_mask[new_idx] is set but events[new_idx]
>>>      remains NULL.
>>>
>>>   4. A PMC overflow NMI fires. The handler iterates active counters,
>>>      finds active_mask[2] set, reads events[2] which is NULL, and
>>>      crashes dereferencing it.
>>>
>>> Move the cpuc->events[hwc->idx] assignment in x86_pmu_enable() to
>>> before the PERF_HES_ARCH check, so that events[] is populated even
>>> for events that are not immediately started. This ensures the
>>> unthrottle path via pmu->start() always finds a valid event pointer.
>>>
>>> Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
>>> Signed-off-by: Breno Leitao <leitao@debian.org>
>>> Cc: stable@vger.kernel.org
>>> ---
>>> Changes in v2:
>>> - Move event pointer setup earlier in x86_pmu_enable() (peterz)
>>> - Rewrote the patch title, given the new approach
>>> - Link to v1: https://patch.msgid.link/20260309-perf-v1-1-601ffb531893@debian.org
>>> ---
>>>  arch/x86/events/core.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>>> index 03ce1bc7ef2ea..54b4c315d927f 100644
>>> --- a/arch/x86/events/core.c
>>> +++ b/arch/x86/events/core.c
>>> @@ -1372,6 +1372,8 @@ static void x86_pmu_enable(struct pmu *pmu)
>>>  			else if (i < n_running)
>>>  				continue;
>>>  
>>> +			cpuc->events[hwc->idx] = event;
>>> +
>>>  			if (hwc->state & PERF_HES_ARCH)
>>>  				continue;
>>>  
>>> @@ -1379,7 +1381,6 @@ static void x86_pmu_enable(struct pmu *pmu)
>>>  			 * if cpuc->enabled = 0, then no wrmsr as
>>>  			 * per x86_pmu_enable_event()
>>>  			 */
>>> -			cpuc->events[hwc->idx] = event;
>>>  			x86_pmu_start(event, PERF_EF_RELOAD);
>>>  		}
>>>  		cpuc->n_added = 0;
>> Just think twice, it seems the change could slightly break the logic of
>> current PEBS counter snapshot logic. 
>>
>> Currently the function intel_perf_event_update_pmc() needs to filter out
>> these uninitialized counter by checking if the event is NULL as below
>> comments and code show.
>>
>> ```
>>
>>      * - An event is stopped for some reason, e.g., throttled.
>>      *   During this period, another event is added and takes the
>>      *   counter of the stopped event. The stopped event is assigned
>>      *   to another new and uninitialized counter, since the
>>      *   x86_pmu_start(RELOAD) is not invoked for a stopped event.
>>      *   The PEBS__DATA_CFG is updated regardless of the event state.
>>      *   The uninitialized counter can be recorded in a PEBS record.
>>      *   But the cpuc->events[uninitialized_counter] is always NULL,
>>      *   because the event is stopped. The uninitialized value is
>>      *   safely dropped.
>>      */
>>     if (!event)
>>         return;
>>
>> ```
>>
>> Once we have this change, then the original index of a stopped event could
>> be assigned to a new event. In these case, although the new event is still
>> not activated, the cpuc->events[original_index] has been initialized and
>> won't be NULL. So intel_perf_event_update_pmc() could update the cached
>> count value to wrong event.
>>
>> I suppose we have two ways to fix this issue.
>>
>> 1. Move "cpuc->events[idx] = event" into x86_pmu_start(), just like what
>> the v1 patch does.
> That's not what v1 did; v1 did an additional setting.

Oh, yeah. "move" is not an accurate word.


>
>> 2. Check cpuc->active_mask in intel_perf_event_update_pmc() as well, but
>> the side effect is that the cached counter snapshots for the stopped events
>> have to be dropped and it has no chance to update the count value for these
>> stopped events even though the HW index of these stopped events are not
>> occupied by other new events.
>>
>> Peter, how's your idea on this? Thanks.
> So you're saying that intel_perf_event_update_pmc() will be trying to
> read the hardware counter; which hasn't been written with a sensible
> value (and thus mis-behave) even though the event is STOPPED and the
> active_mask bit is unset?

Yes, currently intel_perf_event_update_pmc() only checks whether the
cpuc->event[x] (assume the counter index is "x") pointer is NULL before
updating the event count according to the counter snapshot in PEBS record.
But  after the patch 7e772a93eb61 ("perf/x86: Fix NULL event access and
potential PEBS record loss"), cpuc->event[x] won't be cleared after the
event is STOPPED until the event is really deleted, then
intel_perf_event_update_pmc() could still update the event count for a
STOPPED event when draining up buffered PEBS records.

Even worse, the original counter index "x" could be assigned to a new event
in scheduling, then the snapshotted count value could be update the new
event incorrectly.


>
> I'm thinking intel_perf_event_update_pmc() needs help either way around
> :-)

Sure. Would add an active_mask check there. 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
  2026-03-11 17:35     ` Peter Zijlstra
  2026-03-11 20:40       ` Peter Zijlstra
@ 2026-03-12  1:46       ` Mi, Dapeng
  1 sibling, 0 replies; 13+ messages in thread
From: Mi, Dapeng @ 2026-03-12  1:46 UTC (permalink / raw)
  To: Peter Zijlstra, Ian Rogers
  Cc: Breno Leitao, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	James Clark, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-perf-users, linux-kernel, kernel-team,
	stable


On 3/12/2026 1:35 AM, Peter Zijlstra wrote:
> On Wed, Mar 11, 2026 at 09:37:08AM -0700, Ian Rogers wrote:
>
>> Fwiw, an AI review was saying something similar to me. 
> How do you guys get this AI stuff to say anything sensible at all about
> the code? Every time I try, it is telling me the most abject nonsense.
> I've wasted hours of my life trying to guide it to known bugs, in vain.

Base on my experience, the quality of AI generated comments are impacted by
2 factors, one is the AI model, the other is prompt.

Currently I use Github copilot (Claude model) to do some AI reviews. Claude
seems to be the best model for reviewing. Although most of AI review
comments are not 100% accurate, it indeed can point out some potential
issues or gave some clue about the issues. 

About the Prompt, there is a project
https://github.com/masoncl/review-prompts which gathers the review prompt
for Linux kernel. I learnt a lot prompts from it. 


>
>> I wonder if the
>> issue with NMI storms exists already via another path:
>>
>> By populating cpuc->events[idx] here, even for events that skip
>> x86_pmu_start() due to the PERF_HES_ARCH check, can this lead to PEBS
>> data corruption?
>>
>> For Intel PEBS, intel_pmu_pebs_late_setup() iterates over cpuc->event_list
>> and enables PEBS_DATA_CFG for this counter index regardless of its stopped
>> state. If the PMU hardware generates PEBS records for this uninitialized
>> counter, or if there are leftover PEBS records from the counter's previous
>> occupant (since x86_pmu_stop() does not drain the PEBS buffer),
>> intel_perf_event_update_pmc() will now find a non-NULL event pointer.
>> Will it incorrectly process these leftover records and attribute them
>> to the stopped event?
> Yes.
>
>> Additionally, does this change leave the unthrottled event's hardware
>> counter uninitialized?
> Also yes.
>
>> When x86_pmu_enable() moves a throttled event to a new counter, it sets
>> PERF_HES_ARCH and skips calling x86_pmu_start(event, PERF_EF_RELOAD).
>> Later, when the timer tick unthrottles the event, it calls
>> perf_event_unthrottle(), which invokes event->pmu->start(event, 0).
>> In x86_pmu_start(), because flags == 0 (lacking PERF_EF_RELOAD),
>> x86_pmu_set_period() is skipped. Will the newly assigned hardware counter
>> be enabled without being programmed with the event's period, potentially
>> causing it to start from a garbage value and leading to incorrect sampling
>> intervals or NMI storms?
> Don't think you can get NMI storms this way. If the PMI trips, we'll do
> set_period and that should fix it up.
>
> So just wrong counting.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
  2026-03-11 20:40       ` Peter Zijlstra
@ 2026-03-12  2:53         ` Mi, Dapeng
  2026-03-13 13:23           ` Breno Leitao
  0 siblings, 1 reply; 13+ messages in thread
From: Mi, Dapeng @ 2026-03-12  2:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ian Rogers
  Cc: Breno Leitao, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	James Clark, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-perf-users, linux-kernel, kernel-team,
	stable


On 3/12/2026 4:40 AM, Peter Zijlstra wrote:
> On Wed, Mar 11, 2026 at 06:35:09PM +0100, Peter Zijlstra wrote:
>>> Additionally, does this change leave the unthrottled event's hardware
>>> counter uninitialized?
>> Also yes.
> Something like so on top of things I suppose.
>
> ---
> Subject: x86/perf: Make sure to program the counter value for stopped events on migration
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed Mar 11 21:29:14 CET 2026
>
> Both Mi Dapeng and Ian Rogers noted that not everything that sets HES_STOPPED
> is required to EF_UPDATE. Specifically the 'step 1' loop of rescheduling
> explicitly does EF_UPDATE to ensure the counter value is read.
>
> However, then 'step 2' simply leaves the new counter uninitialized when
> HES_STOPPED, even though, as noted above, the thing that stopped them might not
> be aware it needs to EF_RELOAD -- since it didn't EF_UPDATE on stop.
>
> One such location that is affected is throttling, throttle does pmu->stop(, 0);
> and unthrottle does pmu->start(, 0); possibly restarting an uninitialized counter.
>
> Fixes: a4eaf7f14675 ("perf: Rework the PMU methods")
> Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Reported-by: Ian Rogers <irogers@google.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/events/core.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1374,8 +1374,10 @@ static void x86_pmu_enable(struct pmu *p
>  
>  			cpuc->events[hwc->idx] = event;
>  
> -			if (hwc->state & PERF_HES_ARCH)
> +			if (hwc->state & PERF_HES_ARCH) {
> +				static_call(x86_pmu_set_period)(event);
>  				continue;
> +			}
>  
>  			/*
>  			 * if cpuc->enabled = 0, then no wrmsr as

LGTM.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>


>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
  2026-03-12  2:53         ` Mi, Dapeng
@ 2026-03-13 13:23           ` Breno Leitao
  2026-03-13 15:35             ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Breno Leitao @ 2026-03-13 13:23 UTC (permalink / raw)
  To: Mi, Dapeng
  Cc: Peter Zijlstra, Ian Rogers, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, James Clark, Thomas Gleixner, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, linux-perf-users, linux-kernel,
	kernel-team, stable

On Thu, Mar 12, 2026 at 10:53:59AM +0800, Mi, Dapeng wrote:
> On 3/12/2026 4:40 AM, Peter Zijlstra wrote:
> > Subject: x86/perf: Make sure to program the counter value for stopped events on migration
> > From: Peter Zijlstra <peterz@infradead.org>
> > Date: Wed Mar 11 21:29:14 CET 2026
> >
> > Both Mi Dapeng and Ian Rogers noted that not everything that sets HES_STOPPED
> > is required to EF_UPDATE. Specifically the 'step 1' loop of rescheduling
> > explicitly does EF_UPDATE to ensure the counter value is read.
> >
> > However, then 'step 2' simply leaves the new counter uninitialized when
> > HES_STOPPED, even though, as noted above, the thing that stopped them might not
> > be aware it needs to EF_RELOAD -- since it didn't EF_UPDATE on stop.
> >
> > One such location that is affected is throttling, throttle does pmu->stop(, 0);
> > and unthrottle does pmu->start(, 0); possibly restarting an uninitialized counter.
> >
> > Fixes: a4eaf7f14675 ("perf: Rework the PMU methods")
> > Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> > Reported-by: Ian Rogers <irogers@google.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/x86/events/core.c |    4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > --- a/arch/x86/events/core.c
> > +++ b/arch/x86/events/core.c
> > @@ -1374,8 +1374,10 @@ static void x86_pmu_enable(struct pmu *p
> >
> >  			cpuc->events[hwc->idx] = event;
> >
> > -			if (hwc->state & PERF_HES_ARCH)
> > +			if (hwc->state & PERF_HES_ARCH) {
> > +				static_call(x86_pmu_set_period)(event);
> >  				continue;
> > +			}
> >
> >  			/*
> >  			 * if cpuc->enabled = 0, then no wrmsr as
>
> LGTM.
>
> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>

Thank you for the patch and the discussion. To confirm my understanding:
this patch should be applied on top of my v2 series to fully resolve the
issue, correct?

If so, would you prefer that I include both patches together in a single
series, or are you fine with them as-is?

Thanks,
--breno

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
  2026-03-13 13:23           ` Breno Leitao
@ 2026-03-13 15:35             ` Peter Zijlstra
  2026-03-13 16:57               ` Breno Leitao
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2026-03-13 15:35 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Mi, Dapeng, Ian Rogers, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, James Clark, Thomas Gleixner, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, linux-perf-users, linux-kernel,
	kernel-team, stable

On Fri, Mar 13, 2026 at 06:23:41AM -0700, Breno Leitao wrote:
> On Thu, Mar 12, 2026 at 10:53:59AM +0800, Mi, Dapeng wrote:
> > On 3/12/2026 4:40 AM, Peter Zijlstra wrote:
> > > Subject: x86/perf: Make sure to program the counter value for stopped events on migration
> > > From: Peter Zijlstra <peterz@infradead.org>
> > > Date: Wed Mar 11 21:29:14 CET 2026
> > >
> > > Both Mi Dapeng and Ian Rogers noted that not everything that sets HES_STOPPED
> > > is required to EF_UPDATE. Specifically the 'step 1' loop of rescheduling
> > > explicitly does EF_UPDATE to ensure the counter value is read.
> > >
> > > However, then 'step 2' simply leaves the new counter uninitialized when
> > > HES_STOPPED, even though, as noted above, the thing that stopped them might not
> > > be aware it needs to EF_RELOAD -- since it didn't EF_UPDATE on stop.
> > >
> > > One such location that is affected is throttling, throttle does pmu->stop(, 0);
> > > and unthrottle does pmu->start(, 0); possibly restarting an uninitialized counter.
> > >
> > > Fixes: a4eaf7f14675 ("perf: Rework the PMU methods")
> > > Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> > > Reported-by: Ian Rogers <irogers@google.com>
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > >  arch/x86/events/core.c |    4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > --- a/arch/x86/events/core.c
> > > +++ b/arch/x86/events/core.c
> > > @@ -1374,8 +1374,10 @@ static void x86_pmu_enable(struct pmu *p
> > >
> > >  			cpuc->events[hwc->idx] = event;
> > >
> > > -			if (hwc->state & PERF_HES_ARCH)
> > > +			if (hwc->state & PERF_HES_ARCH) {
> > > +				static_call(x86_pmu_set_period)(event);
> > >  				continue;
> > > +			}
> > >
> > >  			/*
> > >  			 * if cpuc->enabled = 0, then no wrmsr as
> >
> > LGTM.
> >
> > Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> 
> Thank you for the patch and the discussion. To confirm my understanding:
> this patch should be applied on top of my v2 series to fully resolve the
> issue, correct?
> 
> If so, would you prefer that I include both patches together in a single
> series, or are you fine with them as-is?

Yeah, I've got then in queue/perf/urgent and once the robot is done
chewing on them, I'll push them out into tip.

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
  2026-03-13 15:35             ` Peter Zijlstra
@ 2026-03-13 16:57               ` Breno Leitao
  0 siblings, 0 replies; 13+ messages in thread
From: Breno Leitao @ 2026-03-13 16:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mi, Dapeng, Ian Rogers, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, James Clark, Thomas Gleixner, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, linux-perf-users, linux-kernel,
	kernel-team, stable

On Fri, Mar 13, 2026 at 04:35:15PM +0100, Peter Zijlstra wrote:

> > Thank you for the patch and the discussion. To confirm my understanding:
> > this patch should be applied on top of my v2 series to fully resolve the
> > issue, correct?
> > 
> > If so, would you prefer that I include both patches together in a single
> > series, or are you fine with them as-is?
> 
> Yeah, I've got then in queue/perf/urgent and once the robot is done
> chewing on them, I'll push them out into tip.

Thanks Peter!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [tip: perf/urgent] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
  2026-03-10 10:13 [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable() Breno Leitao
  2026-03-11  2:04 ` Mi, Dapeng
@ 2026-03-16  9:50 ` tip-bot2 for Breno Leitao
  1 sibling, 0 replies; 13+ messages in thread
From: tip-bot2 for Breno Leitao @ 2026-03-16  9:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Breno Leitao, Peter Zijlstra (Intel), stable, x86, linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     8d5fae6011260de209aaf231120e8146b14bc8e0
Gitweb:        https://git.kernel.org/tip/8d5fae6011260de209aaf231120e8146b14bc8e0
Author:        Breno Leitao <leitao@debian.org>
AuthorDate:    Tue, 10 Mar 2026 03:13:16 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 12 Mar 2026 11:29:15 +01:00

perf/x86: Move event pointer setup earlier in x86_pmu_enable()

A production AMD EPYC system crashed with a NULL pointer dereference
in the PMU NMI handler:

  BUG: kernel NULL pointer dereference, address: 0000000000000198
  RIP: x86_perf_event_update+0xc/0xa0
  Call Trace:
   <NMI>
   amd_pmu_v2_handle_irq+0x1a6/0x390
   perf_event_nmi_handler+0x24/0x40

The faulting instruction is `cmpq $0x0, 0x198(%rdi)` with RDI=0,
corresponding to the `if (unlikely(!hwc->event_base))` check in
x86_perf_event_update() where hwc = &event->hw and event is NULL.

drgn inspection of the vmcore on CPU 106 showed a mismatch between
cpuc->active_mask and cpuc->events[]:

  active_mask: 0x1e (bits 1, 2, 3, 4)
  events[1]:   0xff1100136cbd4f38  (valid)
  events[2]:   0x0                 (NULL, but active_mask bit 2 set)
  events[3]:   0xff1100076fd2cf38  (valid)
  events[4]:   0xff1100079e990a90  (valid)

The event that should occupy events[2] was found in event_list[2]
with hw.idx=2 and hw.state=0x0, confirming x86_pmu_start() had run
(which clears hw.state and sets active_mask) but events[2] was
never populated.

Another event (event_list[0]) had hw.state=0x7 (STOPPED|UPTODATE|ARCH),
showing it was stopped when the PMU rescheduled events, confirming the
throttle-then-reschedule sequence occurred.

The root cause is commit 7e772a93eb61 ("perf/x86: Fix NULL event access
and potential PEBS record loss") which moved the cpuc->events[idx]
assignment out of x86_pmu_start() and into step 2 of x86_pmu_enable(),
after the PERF_HES_ARCH check. This broke any path that calls
pmu->start() without going through x86_pmu_enable() -- specifically
the unthrottle path:

  perf_adjust_freq_unthr_events()
    -> perf_event_unthrottle_group()
      -> perf_event_unthrottle()
        -> event->pmu->start(event, 0)
          -> x86_pmu_start()     // sets active_mask but not events[]

The race sequence is:

  1. A group of perf events overflows, triggering group throttle via
     perf_event_throttle_group(). All events are stopped: active_mask
     bits cleared, events[] preserved (x86_pmu_stop no longer clears
     events[] after commit 7e772a93eb61).

  2. While still throttled (PERF_HES_STOPPED), x86_pmu_enable() runs
     due to other scheduling activity. Stopped events that need to
     move counters get PERF_HES_ARCH set and events[old_idx] cleared.
     In step 2 of x86_pmu_enable(), PERF_HES_ARCH causes these events
     to be skipped -- events[new_idx] is never set.

  3. The timer tick unthrottles the group via pmu->start(). Since
     commit 7e772a93eb61 removed the events[] assignment from
     x86_pmu_start(), active_mask[new_idx] is set but events[new_idx]
     remains NULL.

  4. A PMC overflow NMI fires. The handler iterates active counters,
     finds active_mask[2] set, reads events[2] which is NULL, and
     crashes dereferencing it.

Move the cpuc->events[hwc->idx] assignment in x86_pmu_enable() to
before the PERF_HES_ARCH check, so that events[] is populated even
for events that are not immediately started. This ensures the
unthrottle path via pmu->start() always finds a valid event pointer.

Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260310-perf-v2-1-4a3156fce43c@debian.org
---
 arch/x86/events/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 03ce1bc..54b4c31 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1372,6 +1372,8 @@ static void x86_pmu_enable(struct pmu *pmu)
 			else if (i < n_running)
 				continue;
 
+			cpuc->events[hwc->idx] = event;
+
 			if (hwc->state & PERF_HES_ARCH)
 				continue;
 
@@ -1379,7 +1381,6 @@ static void x86_pmu_enable(struct pmu *pmu)
 			 * if cpuc->enabled = 0, then no wrmsr as
 			 * per x86_pmu_enable_event()
 			 */
-			cpuc->events[hwc->idx] = event;
 			x86_pmu_start(event, PERF_EF_RELOAD);
 		}
 		cpuc->n_added = 0;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-03-16  9:50 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10 10:13 [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable() Breno Leitao
2026-03-11  2:04 ` Mi, Dapeng
2026-03-11 16:37   ` Ian Rogers
2026-03-11 17:35     ` Peter Zijlstra
2026-03-11 20:40       ` Peter Zijlstra
2026-03-12  2:53         ` Mi, Dapeng
2026-03-13 13:23           ` Breno Leitao
2026-03-13 15:35             ` Peter Zijlstra
2026-03-13 16:57               ` Breno Leitao
2026-03-12  1:46       ` Mi, Dapeng
2026-03-11 17:18   ` Peter Zijlstra
2026-03-12  1:05     ` Mi, Dapeng
2026-03-16  9:50 ` [tip: perf/urgent] " tip-bot2 for Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox