* [perf] crashing bug in icl_update_topdown_event
@ 2025-06-11 14:45 Vince Weaver
2025-06-11 14:57 ` Vince Weaver
0 siblings, 1 reply; 6+ messages in thread
From: Vince Weaver @ 2025-06-11 14:45 UTC (permalink / raw)
To: linux-kernel, linux-perf-users
Cc: Liang, Kan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter
Hello
the perf_fuzzer found a hard-lock crash with current git on my RaptorLake
machine. Appears to be in icl_update_topdown_event()
this crashes the machine so hard that I had to take a picture with a
camera and transcribe the oops, let me know if there's missing info that
would help. (Also what's current best practices for getting dumps like
this on machines without serial ports for serial console?)
This does seem to be reproducible so I will try to investigate a bit more
too.
Vince
Oops: general protection fault, maybe for address 0xffff89aeceab400: 0000
CPU: 23 UID: 0 PID: 0 Comm: swapper/23
Tainted: [W]=WARN
Hardware name: Dell Inc. Precision 9660/0VJ762
RIP: 0010:native_read_pmc+0x7/0x40
Code: cc e8 8d a9 01 00 48 89 03 5b cd cc cc cc cc 0f 1f ...
RSP: 000:fffb03100273de8 EFLAGS: 00010046
....
Call Trace:
<TASK>
icl_update_topdown_event+0x165/0x190
? ktime_get+0x38/0xd0
intel_pmu_read_event+0xf9/0x210
__perf_event_read+0xf9/0x210
? __pfx___perf_event_read+0x10/0x10
__flush_smp_call_function_queue+0x37/0x70
do_idle+0x144/0x240
cpu_startup_entry+0x29/0x30
start_secondary+0x119/0x140
common_startup_64+0x13e/0x141
</TASK>
Modules linked in: ...
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [perf] crashing bug in icl_update_topdown_event
2025-06-11 14:45 [perf] crashing bug in icl_update_topdown_event Vince Weaver
@ 2025-06-11 14:57 ` Vince Weaver
2025-06-11 18:53 ` Liang, Kan
0 siblings, 1 reply; 6+ messages in thread
From: Vince Weaver @ 2025-06-11 14:57 UTC (permalink / raw)
To: Vince Weaver
Cc: linux-kernel, linux-perf-users, Liang, Kan, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On Wed, 11 Jun 2025, Vince Weaver wrote:
> Oops: general protection fault, maybe for address 0xffff89aeceab400: 0000
> CPU: 23 UID: 0 PID: 0 Comm: swapper/23
> Tainted: [W]=WARN
> Hardware name: Dell Inc. Precision 9660/0VJ762
> RIP: 0010:native_read_pmc+0x7/0x40
> Code: cc e8 8d a9 01 00 48 89 03 5b cd cc cc cc cc 0f 1f ...
> RSP: 000:fffb03100273de8 EFLAGS: 00010046
one additional note that's probably relevant, this is on a hybrid CPU
machine, so CPUs 16-23 are atom cores that don't support topdown.
So the crash is probably because for whatever reason the kernel is trying
to read topdown events on an unsupported core and triggering a GPF.
Vince
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [perf] crashing bug in icl_update_topdown_event
2025-06-11 14:57 ` Vince Weaver
@ 2025-06-11 18:53 ` Liang, Kan
2025-06-11 19:28 ` Vince Weaver
0 siblings, 1 reply; 6+ messages in thread
From: Liang, Kan @ 2025-06-11 18:53 UTC (permalink / raw)
To: Vince Weaver
Cc: linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On 2025-06-11 10:57 a.m., Vince Weaver wrote:
> On Wed, 11 Jun 2025, Vince Weaver wrote:
>
>> Oops: general protection fault, maybe for address 0xffff89aeceab400: 0000
>> CPU: 23 UID: 0 PID: 0 Comm: swapper/23
>> Tainted: [W]=WARN
>> Hardware name: Dell Inc. Precision 9660/0VJ762
>> RIP: 0010:native_read_pmc+0x7/0x40
>> Code: cc e8 8d a9 01 00 48 89 03 5b cd cc cc cc cc 0f 1f ...
>> RSP: 000:fffb03100273de8 EFLAGS: 00010046
>
> one additional note that's probably relevant, this is on a hybrid CPU
> machine, so CPUs 16-23 are atom cores that don't support topdown.
>
> So the crash is probably because for whatever reason the kernel is trying
> to read topdown events on an unsupported core and triggering a GPF.
>
It seems an regression from the f9bdf1f95339 ("perf/x86/intel: Avoid
disable PMU if !cpuc->enabled in sample read").
The commit merged the intel_pmu_auto_reload_read() and
intel_pmu_read_topdown_event(). It's possible that a PEBS event 0x0400
runs on a ATOM CPU. So the PERF_X86_EVENT_AUTO_RELOAD is set for the
event. The is_topdown_event() also returns true.
Does the below patch help?
It checks the PERF_X86_EVENT_TOPDOWN flag as well before invoking the
topdown functions.
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index c60b6f199f51..67f80a683234 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2826,7 +2826,8 @@ static void intel_pmu_read_event(struct perf_event
*event)
* If the PEBS counters snapshotting is enabled,
* the topdown event is available in PEBS records.
*/
- if (is_topdown_event(event) && !is_pebs_counter_event_group(event))
+ if (is_topdown_count(event) && is_topdown_event(event) &&
+ !is_pebs_counter_event_group(event))
static_call(intel_pmu_update_topdown_event)(event, NULL);
else
intel_pmu_drain_pebs_buffer();
Thanks,
Kan
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [perf] crashing bug in icl_update_topdown_event
2025-06-11 18:53 ` Liang, Kan
@ 2025-06-11 19:28 ` Vince Weaver
2025-06-12 14:47 ` Liang, Kan
0 siblings, 1 reply; 6+ messages in thread
From: Vince Weaver @ 2025-06-11 19:28 UTC (permalink / raw)
To: Liang, Kan
Cc: Vince Weaver, linux-kernel, linux-perf-users, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On Wed, 11 Jun 2025, Liang, Kan wrote:
>
>
> It seems an regression from the f9bdf1f95339 ("perf/x86/intel: Avoid
> disable PMU if !cpuc->enabled in sample read").
> The commit merged the intel_pmu_auto_reload_read() and
> intel_pmu_read_topdown_event(). It's possible that a PEBS event 0x0400
> runs on a ATOM CPU. So the PERF_X86_EVENT_AUTO_RELOAD is set for the
> event. The is_topdown_event() also returns true.
>
> Does the below patch help?
> It checks the PERF_X86_EVENT_TOPDOWN flag as well before invoking the
> topdown functions.
With this patch applied my test case no longer crashes.
Thanks,
Vince Weaver
vincent.weaver@maine.edu
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [perf] crashing bug in icl_update_topdown_event
2025-06-11 19:28 ` Vince Weaver
@ 2025-06-12 14:47 ` Liang, Kan
2025-06-12 16:08 ` Vince Weaver
0 siblings, 1 reply; 6+ messages in thread
From: Liang, Kan @ 2025-06-12 14:47 UTC (permalink / raw)
To: Vince Weaver
Cc: linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On 2025-06-11 3:28 p.m., Vince Weaver wrote:
> On Wed, 11 Jun 2025, Liang, Kan wrote:
>
>>
>>
>> It seems an regression from the f9bdf1f95339 ("perf/x86/intel: Avoid
>> disable PMU if !cpuc->enabled in sample read").
>> The commit merged the intel_pmu_auto_reload_read() and
>> intel_pmu_read_topdown_event(). It's possible that a PEBS event 0x0400
>> runs on a ATOM CPU. So the PERF_X86_EVENT_AUTO_RELOAD is set for the
>> event. The is_topdown_event() also returns true.
>>
>> Does the below patch help?
>> It checks the PERF_X86_EVENT_TOPDOWN flag as well before invoking the
>> topdown functions.
>
> With this patch applied my test case no longer crashes.
>
Thanks for the test. I've posted a fix to LKML.
https://lore.kernel.org/linux-perf-users/20250612143818.2889040-1-kan.liang@linux.intel.com/
It's a little bit different from the one tested. Please check and
provide a 'tested-by' on the new patch if it works.Thanks,Kan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [perf] crashing bug in icl_update_topdown_event
2025-06-12 14:47 ` Liang, Kan
@ 2025-06-12 16:08 ` Vince Weaver
0 siblings, 0 replies; 6+ messages in thread
From: Vince Weaver @ 2025-06-12 16:08 UTC (permalink / raw)
To: Liang, Kan
Cc: Vince Weaver, linux-kernel, linux-perf-users, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter
On Thu, 12 Jun 2025, Liang, Kan wrote:
> Thanks for the test. I've posted a fix to LKML.
> https://lore.kernel.org/linux-perf-users/20250612143818.2889040-1-kan.liang@linux.intel.com/
>
> It's a little bit different from the one tested. Please check and
> provide a 'tested-by' on the new patch if it works.Thanks,Kan
I've tested the new patch and it also runs my testcase without crashing.
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-06-12 16:08 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-11 14:45 [perf] crashing bug in icl_update_topdown_event Vince Weaver
2025-06-11 14:57 ` Vince Weaver
2025-06-11 18:53 ` Liang, Kan
2025-06-11 19:28 ` Vince Weaver
2025-06-12 14:47 ` Liang, Kan
2025-06-12 16:08 ` Vince Weaver
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).