* [PATCH] perf/core: Fix pending work re-queued in __perf_event_overflow
@ 2025-11-09 10:32 Liangyan
2025-11-09 11:45 ` Peter Zijlstra
2025-11-09 16:41 ` [PATCH v2] " Liangyan
0 siblings, 2 replies; 7+ messages in thread
From: Liangyan @ 2025-11-09 10:32 UTC (permalink / raw)
To: peterz, mingo
Cc: acme, namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, james.clark, bigeasy, zengxianjun,
linux-perf-users, linux-kernel, Liangyan
We got warning below during perf test.
[ 467.100914] [ T1] WARNING: CPU: 0 PID: 1 at kernel/events/core.c:5147 put_pmu_ctx+0x2ef/0x3c0
[ 467.107702] [ T1] CPU: 0 UID: 0 PID: 1 Comm: systemd Kdump: loaded Tainted: G E 6.18.0-rc4-dirty #114 PREEMPT(voluntary)
[ 467.109835] [ T1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1
[ 467.111027] [ T1] RIP: 0010:put_pmu_ctx+0x2ef/0x3c0
[ 467.122081] [ T1] Call Trace:
[ 467.122463] [ T1] <TASK>
[ 467.124822] [ T1] __free_event+0x337/0xa50
[ 467.125306] [ T1] perf_pending_task+0x10f/0x3b0
[ 467.125824] [ T1] task_work_run+0x140/0x210
[ 467.127413] [ T1] exit_to_user_mode_loop+0x10e/0x130
[ 467.127965] [ T1] do_syscall_64+0x26d/0x2e0
[ 467.128453] [ T1] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 467.129025] [ T1] RIP: 0033:0x7f01d22349ca
[ 467.135157] [ T1] </TASK>
A race condition occurs between task context and IRQ context when
handling sigtrap tracepoint event overflows:
1. In task context, an event is overflowed and its pending work is
queued to task->task_works
2. Before pending_work is set, the same event overflows in IRQ context
3. Both contexts queue the same perf pending work to task->task_works
This double queuing causes:
- task_work_run() enters infinite loop calling perf_pending_task()
- Potential warnings and use-after-free when event is freed in
perf_pending_task()
Fix the race by disabling interrupts during queuing of perf pending work.
The calltrace of re-queuing pending work is something like below.
[ 466.979877] [ C0] CPU: 0 UID: 0 PID: 1 Comm: systemd Kdump: loaded Tainted: G E 6.18.0-rc4-dirty #114 PREEMPT(voluntary)
[ 466.979889] [ C0] Tainted: [E]=UNSIGNED_MODULE
[ 466.979892] [ C0] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1
[ 466.979897] [ C0] Call Trace:
[ 466.979909] [ C0] <IRQ>
[ 466.979913] [ C0] dump_stack_lvl+0x88/0xb0
[ 466.979924] [ C0] __perf_event_overflow+0xb4f/0xcb0
[ 466.979972] [ C0] perf_swevent_event+0x230/0x340
[ 466.979980] [ C0] perf_tp_event+0x412/0x910
[ 466.980355] [ C0] perf_trace_run_bpf_submit+0x103/0x190
[ 466.980363] [ C0] perf_trace_kmem_cache_alloc+0x156/0x1b0
[ 466.980374] [ C0] kmem_cache_alloc_noprof+0x214/0x600
[ 466.980383] [ C0] __alloc_object+0x2f/0x2d0
[ 466.980392] [ C0] __create_object+0x22/0x90
[ 466.980402] [ C0] kmem_cache_alloc_node_noprof+0x39d/0x620
[ 466.980419] [ C0] kmalloc_reserve+0x167/0x280
[ 466.980428] [ C0] __alloc_skb+0x12e/0x330
[ 466.980466] [ C0] napi_alloc_skb+0x147/0x270
[ 466.980473] [ C0] page_to_skb+0x171/0xaa0 [virtio_net]
[ 466.980498] [ C0] receive_buf+0x7c9/0x3ae0 [virtio_net]
[ 466.980637] [ C0] virtnet_poll+0xb98/0x3160 [virtio_net]
[ 466.980749] [ C0] __napi_poll+0xb0/0x5c0
[ 466.980756] [ C0] net_rx_action+0x416/0xbb0
[ 466.980809] [ C0] handle_softirqs+0x186/0x5d0
[ 466.980818] [ C0] __irq_exit_rcu+0x13f/0x180
[ 466.980826] [ C0] common_interrupt+0x7a/0xa0
[ 466.980834] [ C0] </IRQ>
[ 466.980836] [ C0] <TASK>
[ 466.980839] [ C0] asm_common_interrupt+0x22/0x40
[ 466.980846] [ C0] RIP: 0010:delay_tsc+0x3a/0xa0
[ 466.980854] [ C0] Code: 44 8b 05 ad 59 90 02 0f 01 f9 66 90 48 c1 e2 20 48 89 d7 48 09 c7 eb 21 65 ff 0d 91 59 90 02 74 57 f3 90 65 ff 05 86 59 90 02 <65> 8b 35 83 59 90 02 41 39 f0 75 28 41 89 f0 0f 01 f9 66 90 48 c1
[ 466.980862] [ C0] RSP: 0018:ff110001009477e0 EFLAGS: 00000283
[ 466.980869] [ C0] RAX: 000000fea17a6aa9 RBX: ff11000100a8c500 RCX: 0000000000000000
[ 466.980874] [ C0] RDX: 000000000026df92 RSI: 0000000000000000 RDI: 000000fea1538b17
[ 466.980878] [ C0] RBP: ff11001b3a03bb20 R08: 0000000000000000 R09: 0000000000af7a2e
[ 466.980883] [ C0] R10: ff1100010093cc87 R11: 000000006a14397b R12: 1fe2200020128f03
[ 466.980887] [ C0] R13: ff11000100a8c64c R14: ff11000100a8c600 R15: ff11000100a8c82c
[ 466.980894] [ C0] __perf_event_overflow+0x783/0xcb0
[ 466.980937] [ C0] perf_swevent_event+0x230/0x340
[ 466.980944] [ C0] perf_tp_event+0x412/0x910
[ 466.981200] [ C0] perf_trace_run_bpf_submit+0x103/0x190
[ 466.981212] [ C0] perf_trace_kmem_cache_alloc+0x156/0x1b0
[ 466.981223] [ C0] kmem_cache_alloc_noprof+0x214/0x600
[ 466.981232] [ C0] __alloc_object+0x2f/0x2d0
[ 466.981241] [ C0] __create_object+0x22/0x90
[ 466.981251] [ C0] __kmalloc_cache_noprof+0x405/0x640
[ 466.981283] [ C0] kmem_cache_free+0x18a/0x630
[ 466.981300] [ C0] __fput+0x5c4/0xa70
[ 466.981310] [ C0] fput_close_sync+0xf2/0x1f0
[ 466.981336] [ C0] __x64_sys_close+0x88/0xf0
[ 466.981344] [ C0] do_syscall_64+0x60/0x2e0
[ 466.981351] [ C0] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 466.981357] [ C0] RIP: 0033:0x7f01d22349ca
[ 466.981399] [ C0] </TASK>
Fixes: c5d93d23a260 ("perf: Enqueue SIGTRAP always via task_work.")
Reported-by: Xianjun Zeng <zengxianjun@bytedance.com>
Signed-off-by: Liangyan <liangyan.peng@bytedance.com>
---
kernel/events/core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index cae921f4d137..6c35a129f185 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10427,12 +10427,14 @@ static int __perf_event_overflow(struct perf_event *event,
bool valid_sample = sample_is_allowed(event, regs);
unsigned int pending_id = 1;
enum task_work_notify_mode notify_mode;
+ unsigned long flags;
if (regs)
pending_id = hash32_ptr((void *)instruction_pointer(regs)) ?: 1;
notify_mode = in_nmi() ? TWA_NMI_CURRENT : TWA_RESUME;
+ local_irq_save(flags);
if (!event->pending_work &&
!task_work_add(current, &event->pending_task, notify_mode)) {
event->pending_work = pending_id;
@@ -10458,6 +10460,7 @@ static int __perf_event_overflow(struct perf_event *event,
*/
WARN_ON_ONCE(event->pending_work != pending_id);
}
+ local_irq_restore(flags);
}
READ_ONCE(event->overflow_handler)(event, data, regs);
--
2.39.3 (Apple Git-145)
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/core: Fix pending work re-queued in __perf_event_overflow
2025-11-09 10:32 [PATCH] perf/core: Fix pending work re-queued in __perf_event_overflow Liangyan
@ 2025-11-09 11:45 ` Peter Zijlstra
2025-11-09 16:41 ` [PATCH v2] " Liangyan
1 sibling, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2025-11-09 11:45 UTC (permalink / raw)
To: Liangyan
Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
irogers, adrian.hunter, james.clark, bigeasy, zengxianjun,
linux-perf-users, linux-kernel
On Sun, Nov 09, 2025 at 06:32:53PM +0800, Liangyan wrote:
> A race condition occurs between task context and IRQ context when
> handling sigtrap tracepoint event overflows:
>
> 1. In task context, an event is overflowed and its pending work is
> queued to task->task_works
> 2. Before pending_work is set, the same event overflows in IRQ context
> 3. Both contexts queue the same perf pending work to task->task_works
>
> This double queuing causes:
> - task_work_run() enters infinite loop calling perf_pending_task()
> - Potential warnings and use-after-free when event is freed in
> perf_pending_task()
>
> Fix the race by disabling interrupts during queuing of perf pending work.
> Fixes: c5d93d23a260 ("perf: Enqueue SIGTRAP always via task_work.")
> Reported-by: Xianjun Zeng <zengxianjun@bytedance.com>
> Signed-off-by: Liangyan <liangyan.peng@bytedance.com>
> ---
> kernel/events/core.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index cae921f4d137..6c35a129f185 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -10427,12 +10427,14 @@ static int __perf_event_overflow(struct perf_event *event,
> bool valid_sample = sample_is_allowed(event, regs);
> unsigned int pending_id = 1;
> enum task_work_notify_mode notify_mode;
> + unsigned long flags;
>
> if (regs)
> pending_id = hash32_ptr((void *)instruction_pointer(regs)) ?: 1;
>
> notify_mode = in_nmi() ? TWA_NMI_CURRENT : TWA_RESUME;
>
> + local_irq_save(flags);
This could be written as:
/*
* Comment that explains why we need to disable IRQs.
*/
guard(irqsave)();
> if (!event->pending_work &&
> !task_work_add(current, &event->pending_task, notify_mode)) {
> event->pending_work = pending_id;
> @@ -10458,6 +10460,7 @@ static int __perf_event_overflow(struct perf_event *event,
> */
> WARN_ON_ONCE(event->pending_work != pending_id);
> }
> + local_irq_restore(flags);
> }
>
> READ_ONCE(event->overflow_handler)(event, data, regs);
> --
> 2.39.3 (Apple Git-145)
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2] perf/core: Fix pending work re-queued in __perf_event_overflow
2025-11-09 10:32 [PATCH] perf/core: Fix pending work re-queued in __perf_event_overflow Liangyan
2025-11-09 11:45 ` Peter Zijlstra
@ 2025-11-09 16:41 ` Liangyan
2025-11-11 13:30 ` Sebastian Andrzej Siewior
2025-11-14 3:33 ` [PATCH v3] " Liangyan
1 sibling, 2 replies; 7+ messages in thread
From: Liangyan @ 2025-11-09 16:41 UTC (permalink / raw)
To: peterz, mingo
Cc: acme, namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, james.clark, bigeasy, zengxianjun,
linux-perf-users, linux-kernel, Liangyan
We got warning below during perf test.
[ 467.100914] [ T1] WARNING: CPU: 0 PID: 1 at kernel/events/core.c:5147 put_pmu_ctx+0x2ef/0x3c0
[ 467.107702] [ T1] CPU: 0 UID: 0 PID: 1 Comm: systemd Kdump: loaded Tainted: G E 6.18.0-rc4-dirty #114 PREEMPT(voluntary)
[ 467.109835] [ T1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1
[ 467.111027] [ T1] RIP: 0010:put_pmu_ctx+0x2ef/0x3c0
[ 467.122081] [ T1] Call Trace:
[ 467.122463] [ T1] <TASK>
[ 467.124822] [ T1] __free_event+0x337/0xa50
[ 467.125306] [ T1] perf_pending_task+0x10f/0x3b0
[ 467.125824] [ T1] task_work_run+0x140/0x210
[ 467.127413] [ T1] exit_to_user_mode_loop+0x10e/0x130
[ 467.127965] [ T1] do_syscall_64+0x26d/0x2e0
[ 467.128453] [ T1] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 467.129025] [ T1] RIP: 0033:0x7f01d22349ca
[ 467.135157] [ T1] </TASK>
A race condition occurs between task context and IRQ context when
handling sigtrap tracepoint event overflows:
1. In task context, an event is overflowed and its pending work is
queued to task->task_works
2. Before pending_work is set, the same event overflows in IRQ context
3. Both contexts queue the same perf pending work to task->task_works
This double queuing causes:
- task_work_run() enters infinite loop calling perf_pending_task()
- Potential warnings and use-after-free when event is freed in
perf_pending_task()
Fix the race by disabling interrupts during queuing of perf pending work.
The calltrace of re-queuing pending work is something like below.
[ 466.979877] [ C0] CPU: 0 UID: 0 PID: 1 Comm: systemd Kdump: loaded Tainted: G E 6.18.0-rc4-dirty #114 PREEMPT(voluntary)
[ 466.979889] [ C0] Tainted: [E]=UNSIGNED_MODULE
[ 466.979892] [ C0] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1
[ 466.979897] [ C0] Call Trace:
[ 466.979909] [ C0] <IRQ>
[ 466.979913] [ C0] dump_stack_lvl+0x88/0xb0
[ 466.979924] [ C0] __perf_event_overflow+0xb4f/0xcb0
[ 466.979972] [ C0] perf_swevent_event+0x230/0x340
[ 466.979980] [ C0] perf_tp_event+0x412/0x910
[ 466.980355] [ C0] perf_trace_run_bpf_submit+0x103/0x190
[ 466.980363] [ C0] perf_trace_kmem_cache_alloc+0x156/0x1b0
[ 466.980374] [ C0] kmem_cache_alloc_noprof+0x214/0x600
[ 466.980383] [ C0] __alloc_object+0x2f/0x2d0
[ 466.980392] [ C0] __create_object+0x22/0x90
[ 466.980402] [ C0] kmem_cache_alloc_node_noprof+0x39d/0x620
[ 466.980419] [ C0] kmalloc_reserve+0x167/0x280
[ 466.980428] [ C0] __alloc_skb+0x12e/0x330
[ 466.980466] [ C0] napi_alloc_skb+0x147/0x270
[ 466.980473] [ C0] page_to_skb+0x171/0xaa0 [virtio_net]
[ 466.980498] [ C0] receive_buf+0x7c9/0x3ae0 [virtio_net]
[ 466.980637] [ C0] virtnet_poll+0xb98/0x3160 [virtio_net]
[ 466.980749] [ C0] __napi_poll+0xb0/0x5c0
[ 466.980756] [ C0] net_rx_action+0x416/0xbb0
[ 466.980809] [ C0] handle_softirqs+0x186/0x5d0
[ 466.980818] [ C0] __irq_exit_rcu+0x13f/0x180
[ 466.980826] [ C0] common_interrupt+0x7a/0xa0
[ 466.980834] [ C0] </IRQ>
[ 466.980836] [ C0] <TASK>
[ 466.980839] [ C0] asm_common_interrupt+0x22/0x40
[ 466.980846] [ C0] RIP: 0010:delay_tsc+0x3a/0xa0
[ 466.980854] [ C0] Code: 44 8b 05 ad 59 90 02 0f 01 f9 66 90 48 c1 e2 20 48 89 d7 48 09 c7 eb 21 65 ff 0d 91 59 90 02 74 57 f3 90 65 ff 05 86 59 90 02 <65> 8b 35 83 59 90 02 41 39 f0 75 28 41 89 f0 0f 01 f9 66 90 48 c1
[ 466.980862] [ C0] RSP: 0018:ff110001009477e0 EFLAGS: 00000283
[ 466.980869] [ C0] RAX: 000000fea17a6aa9 RBX: ff11000100a8c500 RCX: 0000000000000000
[ 466.980874] [ C0] RDX: 000000000026df92 RSI: 0000000000000000 RDI: 000000fea1538b17
[ 466.980878] [ C0] RBP: ff11001b3a03bb20 R08: 0000000000000000 R09: 0000000000af7a2e
[ 466.980883] [ C0] R10: ff1100010093cc87 R11: 000000006a14397b R12: 1fe2200020128f03
[ 466.980887] [ C0] R13: ff11000100a8c64c R14: ff11000100a8c600 R15: ff11000100a8c82c
[ 466.980894] [ C0] __perf_event_overflow+0x783/0xcb0
[ 466.980937] [ C0] perf_swevent_event+0x230/0x340
[ 466.980944] [ C0] perf_tp_event+0x412/0x910
[ 466.981200] [ C0] perf_trace_run_bpf_submit+0x103/0x190
[ 466.981212] [ C0] perf_trace_kmem_cache_alloc+0x156/0x1b0
[ 466.981223] [ C0] kmem_cache_alloc_noprof+0x214/0x600
[ 466.981232] [ C0] __alloc_object+0x2f/0x2d0
[ 466.981241] [ C0] __create_object+0x22/0x90
[ 466.981251] [ C0] __kmalloc_cache_noprof+0x405/0x640
[ 466.981283] [ C0] kmem_cache_free+0x18a/0x630
[ 466.981300] [ C0] __fput+0x5c4/0xa70
[ 466.981310] [ C0] fput_close_sync+0xf2/0x1f0
[ 466.981336] [ C0] __x64_sys_close+0x88/0xf0
[ 466.981344] [ C0] do_syscall_64+0x60/0x2e0
[ 466.981351] [ C0] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 466.981357] [ C0] RIP: 0033:0x7f01d22349ca
[ 466.981399] [ C0] </TASK>
Fixes: c5d93d23a260 ("perf: Enqueue SIGTRAP always via task_work.")
Reported-by: Xianjun Zeng <zengxianjun@bytedance.com>
Signed-off-by: Liangyan <liangyan.peng@bytedance.com>
---
v2: Use guard(irqsave) suggested by Peter.
---
kernel/events/core.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index cae921f4d137..7c63e5fdd334 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10433,6 +10433,16 @@ static int __perf_event_overflow(struct perf_event *event,
notify_mode = in_nmi() ? TWA_NMI_CURRENT : TWA_RESUME;
+ /*
+ * Task context queues the work via task_work_add() but has not yet
+ * set event->pending_work when the same event overflows in
+ * IRQ context. The IRQ path, seeing !event->pending_work,
+ * queues the work again.
+ * The double queuing causes corruption in task->task_works.
+ * Prevent this by disabling interrupts around the critical section.
+ */
+ guard(irqsave)();
+
if (!event->pending_work &&
!task_work_add(current, &event->pending_task, notify_mode)) {
event->pending_work = pending_id;
--
2.39.3 (Apple Git-145)
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2] perf/core: Fix pending work re-queued in __perf_event_overflow
2025-11-09 16:41 ` [PATCH v2] " Liangyan
@ 2025-11-11 13:30 ` Sebastian Andrzej Siewior
2025-11-12 3:28 ` [External] " Liangyan
2025-11-14 3:33 ` [PATCH v3] " Liangyan
1 sibling, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-11-11 13:30 UTC (permalink / raw)
To: Liangyan
Cc: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
jolsa, irogers, adrian.hunter, james.clark, zengxianjun,
linux-perf-users, linux-kernel
On 2025-11-10 00:41:22 [+0800], Liangyan wrote:
> We got warning below during perf test.
…
>
> A race condition occurs between task context and IRQ context when
> handling sigtrap tracepoint event overflows:
>
> 1. In task context, an event is overflowed and its pending work is
> queued to task->task_works
> 2. Before pending_work is set, the same event overflows in IRQ context
> 3. Both contexts queue the same perf pending work to task->task_works
>
> This double queuing causes:
> - task_work_run() enters infinite loop calling perf_pending_task()
> - Potential warnings and use-after-free when event is freed in
> perf_pending_task()
>
> Fix the race by disabling interrupts during queuing of perf pending work.
Makes sense. Lets it does not overflow in NMI, too.
Either way I suggest to trim the commit message and remove the
backtrace as it probably does not add any value.
Sebastian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [External] Re: [PATCH v2] perf/core: Fix pending work re-queued in __perf_event_overflow
2025-11-11 13:30 ` Sebastian Andrzej Siewior
@ 2025-11-12 3:28 ` Liangyan
0 siblings, 0 replies; 7+ messages in thread
From: Liangyan @ 2025-11-12 3:28 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, Liangyan
Cc: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
jolsa, irogers, adrian.hunter, james.clark, zengxianjun,
linux-perf-users, linux-kernel
On 2025/11/11 21:30, Sebastian Andrzej Siewior wrote:
> On 2025-11-10 00:41:22 [+0800], Liangyan wrote:
>> We got warning below during perf test.
> …
>>
>> A race condition occurs between task context and IRQ context when
>> handling sigtrap tracepoint event overflows:
>>
>> 1. In task context, an event is overflowed and its pending work is
>> queued to task->task_works
>> 2. Before pending_work is set, the same event overflows in IRQ context
>> 3. Both contexts queue the same perf pending work to task->task_works
>>
>> This double queuing causes:
>> - task_work_run() enters infinite loop calling perf_pending_task()
>> - Potential warnings and use-after-free when event is freed in
>> perf_pending_task()
>>
>> Fix the race by disabling interrupts during queuing of perf pending work.
>
> Makes sense. Lets it does not overflow in NMI, too.
> Either way I suggest to trim the commit message and remove the
> backtrace as it probably does not add any value.
>
> Sebastian
Got it, I will remove the backtrace in the following version.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3] perf/core: Fix pending work re-queued in __perf_event_overflow
2025-11-09 16:41 ` [PATCH v2] " Liangyan
2025-11-11 13:30 ` Sebastian Andrzej Siewior
@ 2025-11-14 3:33 ` Liangyan
2025-11-14 8:02 ` Sebastian Andrzej Siewior
1 sibling, 1 reply; 7+ messages in thread
From: Liangyan @ 2025-11-14 3:33 UTC (permalink / raw)
To: peterz, mingo, bigeasy
Cc: acme, namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, james.clark, zengxianjun, linux-perf-users,
linux-kernel, Liangyan
We got warning below during perf test.
[ 467.100914] [ T1] WARNING: CPU: 0 PID: 1 at kernel/events/core.c:5147 put_pmu_ctx+0x2ef/0x3c0
[ 467.107702] [ T1] CPU: 0 UID: 0 PID: 1 Comm: systemd Kdump: loaded Tainted: G E 6.18.0-rc4-dirty #114 PREEMPT(voluntary)
[ 467.109835] [ T1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1
[ 467.111027] [ T1] RIP: 0010:put_pmu_ctx+0x2ef/0x3c0
[ 467.122081] [ T1] Call Trace:
[ 467.122463] [ T1] <TASK>
[ 467.124822] [ T1] __free_event+0x337/0xa50
[ 467.125306] [ T1] perf_pending_task+0x10f/0x3b0
[ 467.125824] [ T1] task_work_run+0x140/0x210
[ 467.127413] [ T1] exit_to_user_mode_loop+0x10e/0x130
[ 467.127965] [ T1] do_syscall_64+0x26d/0x2e0
[ 467.128453] [ T1] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 467.129025] [ T1] RIP: 0033:0x7f01d22349ca
[ 467.135157] [ T1] </TASK>
A race condition occurs between task context and IRQ context when
handling sigtrap tracepoint event overflows:
1. In task context, an event is overflowed and its pending work is
queued to task->task_works
2. Before pending_work is set, the same event overflows in IRQ context
3. Both contexts queue the same perf pending work to task->task_works
This double queuing causes:
- task_work_run() enters infinite loop calling perf_pending_task()
- Potential warnings and use-after-free when event is freed in
perf_pending_task()
Fix the race by disabling interrupts during queuing of perf pending work.
Fixes: c5d93d23a260 ("perf: Enqueue SIGTRAP always via task_work.")
Reported-by: Xianjun Zeng <zengxianjun@bytedance.com>
Signed-off-by: Liangyan <liangyan.peng@bytedance.com>
---
v3: Refine commit log suggested by Sebastian.
---
kernel/events/core.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index cae921f4d137..7c63e5fdd334 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10433,6 +10433,16 @@ static int __perf_event_overflow(struct perf_event *event,
notify_mode = in_nmi() ? TWA_NMI_CURRENT : TWA_RESUME;
+ /*
+ * Task context queues the work via task_work_add() but has not yet
+ * set event->pending_work when the same event overflows in
+ * IRQ context. The IRQ path, seeing !event->pending_work,
+ * queues the work again.
+ * The double queuing causes corruption in task->task_works.
+ * Prevent this by disabling interrupts around the critical section.
+ */
+ guard(irqsave)();
+
if (!event->pending_work &&
!task_work_add(current, &event->pending_task, notify_mode)) {
event->pending_work = pending_id;
--
2.39.3 (Apple Git-145)
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3] perf/core: Fix pending work re-queued in __perf_event_overflow
2025-11-14 3:33 ` [PATCH v3] " Liangyan
@ 2025-11-14 8:02 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-11-14 8:02 UTC (permalink / raw)
To: Liangyan
Cc: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
jolsa, irogers, adrian.hunter, james.clark, zengxianjun,
linux-perf-users, linux-kernel
On 2025-11-14 11:33:49 [+0800], Liangyan wrote:
> We got warning below during perf test.
> [ 467.100914] [ T1] WARNING: CPU: 0 PID: 1 at kernel/events/core.c:5147 put_pmu_ctx+0x2ef/0x3c0
> [ 467.107702] [ T1] CPU: 0 UID: 0 PID: 1 Comm: systemd Kdump: loaded Tainted: G E 6.18.0-rc4-dirty #114 PREEMPT(voluntary)
> [ 467.109835] [ T1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1
> [ 467.111027] [ T1] RIP: 0010:put_pmu_ctx+0x2ef/0x3c0
> [ 467.122081] [ T1] Call Trace:
> [ 467.122463] [ T1] <TASK>
> [ 467.124822] [ T1] __free_event+0x337/0xa50
> [ 467.125306] [ T1] perf_pending_task+0x10f/0x3b0
> [ 467.125824] [ T1] task_work_run+0x140/0x210
> [ 467.127413] [ T1] exit_to_user_mode_loop+0x10e/0x130
> [ 467.127965] [ T1] do_syscall_64+0x26d/0x2e0
> [ 467.128453] [ T1] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 467.129025] [ T1] RIP: 0033:0x7f01d22349ca
> [ 467.135157] [ T1] </TASK>
>
> A race condition occurs between task context and IRQ context when
> handling sigtrap tracepoint event overflows:
>
> 1. In task context, an event is overflowed and its pending work is
> queued to task->task_works
> 2. Before pending_work is set, the same event overflows in IRQ context
> 3. Both contexts queue the same perf pending work to task->task_works
>
> This double queuing causes:
> - task_work_run() enters infinite loop calling perf_pending_task()
> - Potential warnings and use-after-free when event is freed in
> perf_pending_task()
>
> Fix the race by disabling interrupts during queuing of perf pending work.
>
> Fixes: c5d93d23a260 ("perf: Enqueue SIGTRAP always via task_work.")
> Reported-by: Xianjun Zeng <zengxianjun@bytedance.com>
> Signed-off-by: Liangyan <liangyan.peng@bytedance.com>
> ---
> v3: Refine commit log suggested by Sebastian.
I assumed you get rid of the warning backtrace as it adds to value but
instead you added the whole thing including timestamps and so on.
> ---
> kernel/events/core.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index cae921f4d137..7c63e5fdd334 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -10433,6 +10433,16 @@ static int __perf_event_overflow(struct perf_event *event,
>
> notify_mode = in_nmi() ? TWA_NMI_CURRENT : TWA_RESUME;
>
> + /*
> + * Task context queues the work via task_work_add() but has not yet
> + * set event->pending_work when the same event overflows in
> + * IRQ context. The IRQ path, seeing !event->pending_work,
> + * queues the work again.
> + * The double queuing causes corruption in task->task_works.
The same event can be enqueued in TASK and IRQ context because
assigning perf_event::pending_work is not atomic in regard to
enqueue. task_work_add() does not prevent double enqueue.
The above should be enough if it is not self explained :)
However I did think that we have per-context events here. But it seems
those are not used in this case here.
> + * Prevent this by disabling interrupts around the critical section.
> + */
> + guard(irqsave)();
> +
> if (!event->pending_work &&
> !task_work_add(current, &event->pending_task, notify_mode)) {
> event->pending_work = pending_id;
Sebastian
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-11-14 8:02 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-09 10:32 [PATCH] perf/core: Fix pending work re-queued in __perf_event_overflow Liangyan
2025-11-09 11:45 ` Peter Zijlstra
2025-11-09 16:41 ` [PATCH v2] " Liangyan
2025-11-11 13:30 ` Sebastian Andrzej Siewior
2025-11-12 3:28 ` [External] " Liangyan
2025-11-14 3:33 ` [PATCH v3] " Liangyan
2025-11-14 8:02 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).