public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
Cc: Breno Leitao <leitao@debian.org>, Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	James Clark <james.clark@linaro.org>,
	Thomas Gleixner <tglx@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, stable@vger.kernel.org
Subject: Re: [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
Date: Wed, 11 Mar 2026 18:18:50 +0100	[thread overview]
Message-ID: <20260311171850.GQ606826@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <a0a1d8ab-85cd-411c-b8e2-9e7e2f7136fd@linux.intel.com>

On Wed, Mar 11, 2026 at 10:04:10AM +0800, Mi, Dapeng wrote:
> 
> On 3/10/2026 6:13 PM, Breno Leitao wrote:
> > A production AMD EPYC system crashed with a NULL pointer dereference
> > in the PMU NMI handler:
> >
> >   BUG: kernel NULL pointer dereference, address: 0000000000000198
> >   RIP: x86_perf_event_update+0xc/0xa0
> >   Call Trace:
> >    <NMI>
> >    amd_pmu_v2_handle_irq+0x1a6/0x390
> >    perf_event_nmi_handler+0x24/0x40
> >
> > The faulting instruction is `cmpq $0x0, 0x198(%rdi)` with RDI=0,
> > corresponding to the `if (unlikely(!hwc->event_base))` check in
> > x86_perf_event_update() where hwc = &event->hw and event is NULL.
> >
> > drgn inspection of the vmcore on CPU 106 showed a mismatch between
> > cpuc->active_mask and cpuc->events[]:
> >
> >   active_mask: 0x1e (bits 1, 2, 3, 4)
> >   events[1]:   0xff1100136cbd4f38  (valid)
> >   events[2]:   0x0                 (NULL, but active_mask bit 2 set)
> >   events[3]:   0xff1100076fd2cf38  (valid)
> >   events[4]:   0xff1100079e990a90  (valid)
> >
> > The event that should occupy events[2] was found in event_list[2]
> > with hw.idx=2 and hw.state=0x0, confirming x86_pmu_start() had run
> > (which clears hw.state and sets active_mask) but events[2] was
> > never populated.
> >
> > Another event (event_list[0]) had hw.state=0x7 (STOPPED|UPTODATE|ARCH),
> > showing it was stopped when the PMU rescheduled events, confirming the
> > throttle-then-reschedule sequence occurred.
> >
> > The root cause is commit 7e772a93eb61 ("perf/x86: Fix NULL event access
> > and potential PEBS record loss") which moved the cpuc->events[idx]
> > assignment out of x86_pmu_start() and into step 2 of x86_pmu_enable(),
> > after the PERF_HES_ARCH check. This broke any path that calls
> > pmu->start() without going through x86_pmu_enable() -- specifically
> > the unthrottle path:
> >
> >   perf_adjust_freq_unthr_events()
> >     -> perf_event_unthrottle_group()
> >       -> perf_event_unthrottle()
> >         -> event->pmu->start(event, 0)
> >           -> x86_pmu_start()     // sets active_mask but not events[]
> >
> > The race sequence is:
> >
> >   1. A group of perf events overflows, triggering group throttle via
> >      perf_event_throttle_group(). All events are stopped: active_mask
> >      bits cleared, events[] preserved (x86_pmu_stop no longer clears
> >      events[] after commit 7e772a93eb61).
> >
> >   2. While still throttled (PERF_HES_STOPPED), x86_pmu_enable() runs
> >      due to other scheduling activity. Stopped events that need to
> >      move counters get PERF_HES_ARCH set and events[old_idx] cleared.
> >      In step 2 of x86_pmu_enable(), PERF_HES_ARCH causes these events
> >      to be skipped -- events[new_idx] is never set.
> >
> >   3. The timer tick unthrottles the group via pmu->start(). Since
> >      commit 7e772a93eb61 removed the events[] assignment from
> >      x86_pmu_start(), active_mask[new_idx] is set but events[new_idx]
> >      remains NULL.
> >
> >   4. A PMC overflow NMI fires. The handler iterates active counters,
> >      finds active_mask[2] set, reads events[2] which is NULL, and
> >      crashes dereferencing it.
> >
> > Move the cpuc->events[hwc->idx] assignment in x86_pmu_enable() to
> > before the PERF_HES_ARCH check, so that events[] is populated even
> > for events that are not immediately started. This ensures the
> > unthrottle path via pmu->start() always finds a valid event pointer.
> >
> > Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > Cc: stable@vger.kernel.org
> > ---
> > Changes in v2:
> > - Move event pointer setup earlier in x86_pmu_enable() (peterz)
> > - Rewrote the patch title, given the new approach
> > - Link to v1: https://patch.msgid.link/20260309-perf-v1-1-601ffb531893@debian.org
> > ---
> >  arch/x86/events/core.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > index 03ce1bc7ef2ea..54b4c315d927f 100644
> > --- a/arch/x86/events/core.c
> > +++ b/arch/x86/events/core.c
> > @@ -1372,6 +1372,8 @@ static void x86_pmu_enable(struct pmu *pmu)
> >  			else if (i < n_running)
> >  				continue;
> >  
> > +			cpuc->events[hwc->idx] = event;
> > +
> >  			if (hwc->state & PERF_HES_ARCH)
> >  				continue;
> >  
> > @@ -1379,7 +1381,6 @@ static void x86_pmu_enable(struct pmu *pmu)
> >  			 * if cpuc->enabled = 0, then no wrmsr as
> >  			 * per x86_pmu_enable_event()
> >  			 */
> > -			cpuc->events[hwc->idx] = event;
> >  			x86_pmu_start(event, PERF_EF_RELOAD);
> >  		}
> >  		cpuc->n_added = 0;
> 
> Just think twice, it seems the change could slightly break the logic of
> current PEBS counter snapshot logic. 
> 
> Currently the function intel_perf_event_update_pmc() needs to filter out
> these uninitialized counter by checking if the event is NULL as below
> comments and code show.
> 
> ```
> 
>      * - An event is stopped for some reason, e.g., throttled.
>      *   During this period, another event is added and takes the
>      *   counter of the stopped event. The stopped event is assigned
>      *   to another new and uninitialized counter, since the
>      *   x86_pmu_start(RELOAD) is not invoked for a stopped event.
>      *   The PEBS__DATA_CFG is updated regardless of the event state.
>      *   The uninitialized counter can be recorded in a PEBS record.
>      *   But the cpuc->events[uninitialized_counter] is always NULL,
>      *   because the event is stopped. The uninitialized value is
>      *   safely dropped.
>      */
>     if (!event)
>         return;
> 
> ```
> 
> Once we have this change, then the original index of a stopped event could
> be assigned to a new event. In these case, although the new event is still
> not activated, the cpuc->events[original_index] has been initialized and
> won't be NULL. So intel_perf_event_update_pmc() could update the cached
> count value to wrong event.
> 
> I suppose we have two ways to fix this issue.
> 
> 1. Move "cpuc->events[idx] = event" into x86_pmu_start(), just like what
> the v1 patch does.

That's not what v1 did; v1 did an additional setting.

> 2. Check cpuc->active_mask in intel_perf_event_update_pmc() as well, but
> the side effect is that the cached counter snapshots for the stopped events
> have to be dropped and it has no chance to update the count value for these
> stopped events even though the HW index of these stopped events are not
> occupied by other new events.
> 
> Peter, how's your idea on this? Thanks.

So you're saying that intel_perf_event_update_pmc() will be trying to
read the hardware counter; which hasn't been written with a sensible
value (and thus mis-behave) even though the event is STOPPED and the
active_mask bit is unset?

I'm thinking intel_perf_event_update_pmc() needs help either way around
:-)

  parent reply	other threads:[~2026-03-11 17:19 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-10 10:13 [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable() Breno Leitao
2026-03-11  2:04 ` Mi, Dapeng
2026-03-11 16:37   ` Ian Rogers
2026-03-11 17:35     ` Peter Zijlstra
2026-03-11 20:40       ` Peter Zijlstra
2026-03-12  2:53         ` Mi, Dapeng
2026-03-13 13:23           ` Breno Leitao
2026-03-13 15:35             ` Peter Zijlstra
2026-03-13 16:57               ` Breno Leitao
2026-03-12  1:46       ` Mi, Dapeng
2026-03-11 17:18   ` Peter Zijlstra [this message]
2026-03-12  1:05     ` Mi, Dapeng
2026-03-16  9:50 ` [tip: perf/urgent] " tip-bot2 for Breno Leitao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260311171850.GQ606826@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=irogers@google.com \
    --cc=james.clark@linaro.org \
    --cc=jolsa@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox