[BUG] Deadlock triggered by bpfsnoop funcgraph feature

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG] Deadlock triggered by bpfsnoop funcgraph feature
@ 2025-08-27  2:13 Leon Hwang
  2025-08-27  2:23 ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Leon Hwang @ 2025-08-27  2:13 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa

Hi,

I’ve encountered a reproducible deadlock while developing the funcgraph
feature for bpfsnoop [0].

Even on the latest bpf-next_base commit
2465bb83e0b4 ("Merge branch
's390-bpf-add-s390-jit-support-for-timed-may_goto'"),
the issue still persists.

Reproduction:
1. Build bpfsnoop with Go 1.24 and LLVM 20.
2. Start a VM using vmtest [1].
3. Trigger the deadlock with:
   './bpfsnoop -k "htab_*_elem" --output-fgraph --fgraph-debug'

Logs:
[  126.934205] watchdog: CPU1: Watchdog detected hard LOCKUP on cpu 1
[  126.934406] Modules linked in:
[  126.934713] irq event stamp: 283284
[  126.934806] hardirqs last  enabled at (283283): [<ffffffffa7fa89f8>]
default_idle_call+0xb8/0x1d0
[  126.934925] hardirqs last disabled at (283284): [<ffffffffa73ac21f>]
tick_nohz_idle_exit+0x8f/0x110
[  126.935026] softirqs last  enabled at (283262): [<ffffffffa72a4a06>]
__irq_exit_rcu+0xa6/0xd0
[  126.935124] softirqs last disabled at (283255): [<ffffffffa72a4a06>]
__irq_exit_rcu+0xa6/0xd0
[  126.935518] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted
6.17.0-rc1-gcb708c11617a #23 PREEMPT(full)
[  126.935662] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX,
1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  126.935865] RIP: 0010:__lock_acquire+0x30f/0x2590
[  126.935973] Code: 89 f8 45 89 f7 49 89 de 4c 89 e3 41 89 cc 48 89 c1
eb 3e 48 8d 04 80 48 8d 04 80 48 8d 34 c5 40 78 87 a9 0f b6 86 c4 00 00
00 <84> c0 74 12 41 38 c0 44 0f 47 c0 80 be c6 00 00 00 02 44 0f 44 c0
[  126.936062] RSP: 0018:ffffad20800ab008 EFLAGS: 00000007
[  126.936219] RAX: 0000000000000003 RBX: ffff97af803f2d18 RCX:
0000000000000000
[  126.936308] RDX: 0000000000000001 RSI: ffffffffa9877c28 RDI:
0000000000000007
[  126.936394] RBP: ffffad20800ab080 R08: 0000000000000003 R09:
0000000000000005
[  126.936480] R10: 0000000000000000 R11: 0000000000000007 R12:
0000000000000003
[  126.936566] R13: ffff97af803f2240 R14: ffff97af803f2db8 R15:
0000000000000000
[  126.936655] FS:  0000000000000000(0000) GS:ffff97b0126d8000(0000)
knlGS:0000000000000000
[  126.936744] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  126.936830] CR2: 000000c0285a3000 CR3: 0000000102c62004 CR4:
0000000000770ef0
[  126.936918] PKRU: 55555554
[  126.937038] Call Trace:
[  126.937133]  <TASK>
[  126.937222]  ? __lock_acquire+0x43d/0x2590
[  126.937620]  lock_acquire+0xb1/0x2c0
[  126.937706]  ? __bpf_prog_enter_recur+0x2a/0x110
[  126.937826]  ? lock_release+0xc6/0x280
[  126.937910]  ? lock_release+0xc6/0x280
[  126.938006]  __bpf_prog_enter_recur+0x3e/0x110
[  126.938090]  ? __bpf_prog_enter_recur+0x2a/0x110
[  126.938204]  bpf_trampoline_6442539790+0x88/0x110
[  126.938301]  rcu_lockdep_current_cpu_online+0x9/0x70
[  126.938392]  ? rcu_read_lock_held+0x31/0x60
[  126.938501]  bpf_trampoline_6442539812+0x66/0x110
[  126.938594]  rcu_read_lock_held+0x9/0x60
[  126.938678]  ? __htab_map_lookup_elem+0x25/0xf0
[  126.938798]  bpf_trampoline_6442491246+0x79/0x123
[  126.938894]  __htab_map_lookup_elem+0x9/0xf0
[  126.938991]  ?
bpf_prog_243665d136749c2c_bpfsnoop_fgraph_tailcallee+0x129/0x14a
[  126.939080]  ? __htab_map_lookup_elem+0x9/0xf0
[  126.939182]  bpf_prog_1d471894f1fc624c_bpfsnoop_fgraph+0x12e/0x3e8
[  126.939285]  ? lock_release+0xc6/0x280
[  126.939381]  ? __bpf_prog_enter_recur+0x43/0x110
[  126.939473]  bpf_trampoline_6442539790+0x4b/0x110
[  126.939566]  rcu_lockdep_current_cpu_online+0x9/0x70
[  126.939649]  ? rcu_read_lock_held+0x31/0x60
[  126.939737]  bpf_trampoline_6442539812+0x66/0x110
[  126.939829]  rcu_read_lock_held+0x9/0x60
[  126.939913]  ? __htab_map_lookup_elem+0x25/0xf0
[  126.940010]  bpf_trampoline_6442491246+0x79/0x123
[  126.940105]  __htab_map_lookup_elem+0x9/0xf0
[  126.940212]  ?
bpf_prog_243665d136749c2c_bpfsnoop_fgraph_tailcallee+0x129/0x14a
[  126.940300]  ? rcu_lockdep_current_cpu_online+0x9/0x70
[  126.940402]  bpf_prog_1ed83077283e3ded_bpfsnoop_fgraph+0x12e/0x423
[  126.940517]  ? __bpf_prog_enter_recur+0x43/0x110
[  126.940609]  bpf_trampoline_6442491246+0xac/0x123
[  126.940705]  __htab_map_lookup_elem+0x9/0xf0
[  126.940796]  ?
bpf_prog_243665d136749c2c_bpfsnoop_fgraph_tailcallee+0x95/0x14a
[  126.940895]  ? bpf_prog_1d471894f1fc624c_bpfsnoop_fgraph+0x12e/0x3e8
[  126.940980]  ? bpf_prog_1d471894f1fc624c_bpfsnoop_fgraph+0x12e/0x3e8
[  126.941080]  bpf_prog_8c9f4824b35e5d8e_bpfsnoop_fgraph+0x12e/0x423
[  126.941181]  ? lock_release+0xc6/0x280
[  126.941265]  ? lock_release+0xc6/0x280
[  126.941360]  ? __bpf_prog_enter_recur+0x43/0x110
[  126.941452]  bpf_trampoline_6442539790+0x99/0x110
[  126.941544]  rcu_lockdep_current_cpu_online+0x9/0x70
[  126.941627]  ? rcu_read_lock_held+0x31/0x60
[  126.941715]  bpf_trampoline_6442539812+0x66/0x110
[  126.941813]  rcu_read_lock_held+0x9/0x60
[  126.941896]  ? __htab_map_lookup_elem+0x25/0xf0
[  126.941993]  bpf_trampoline_6442491246+0x79/0x123
[  126.942089]  __htab_map_lookup_elem+0x9/0xf0
[  126.942186]  ?
bpf_prog_243665d136749c2c_bpfsnoop_fgraph_tailcallee+0x95/0x14a
[  126.942276]  ? bpf_trampoline_6442539812+0x4b/0x110
[  126.942360]  ? bpf_trampoline_6442539812+0x4b/0x110
[  126.942446]  bpf_prog_8c9f4824b35e5d8e_bpfsnoop_fgraph+0x12e/0x423
[  126.942552]  ? __bpf_tramp_exit+0x72/0x130
[  126.942647]  ? __bpf_prog_enter_recur+0x43/0x110
[  126.942739]  bpf_trampoline_6442539812+0x99/0x110
[  126.942832]  rcu_read_lock_held+0x9/0x60
[  126.942915]  ? __htab_map_lookup_elem+0x25/0xf0
[  126.943012]  bpf_trampoline_6442491246+0x79/0x123
[  126.943108]  __htab_map_lookup_elem+0x9/0xf0
[  126.943209]  ?
bpf_prog_243665d136749c2c_bpfsnoop_fgraph_tailcallee+0x95/0x14a
[  126.943299]  ? bpf_trampoline_6442491246+0x79/0x123
[  126.943383]  ? bpf_trampoline_6442491246+0x79/0x123
[  126.943469]  bpf_prog_1d471894f1fc624c_bpfsnoop_fgraph+0x12e/0x3e8
[  126.943571]  ? lock_release+0xc6/0x280
[  126.943666]  ? __bpf_prog_enter_recur+0x43/0x110
[  126.943758]  bpf_trampoline_6442539812+0x4b/0x110
[  126.943851]  rcu_read_lock_held+0x9/0x60
[  126.943934]  ? __htab_map_lookup_elem+0x25/0xf0
[  126.944031]  bpf_trampoline_6442491246+0x79/0x123
[  126.944126]  __htab_map_lookup_elem+0x9/0xf0
[  126.944235]  ?
bpf_prog_243665d136749c2c_bpfsnoop_fgraph_tailcallee+0x129/0x14a
[  126.944345]  ? bpf_prog_5c5e9b8ca18045f2_bpfsnoop_fgraph+0x12e/0x3f2
[  126.944442]  bpf_prog_5c5b59f2388bb72a_bpfsnoop_fgraph+0x12e/0x3f2
[  126.944545]  ? lock_release+0xc6/0x280
[  126.944640]  ? __bpf_prog_enter_recur+0x43/0x110
[  126.944732]  bpf_trampoline_6442491246+0x56/0x123
[  126.944828]  __htab_map_lookup_elem+0x9/0xf0
[  126.944931]  ?
bpf_prog_243665d136749c2c_bpfsnoop_fgraph_tailcallee+0x129/0x14a
[  126.945019]  ? tick_nohz_idle_exit+0xc9/0x110
[  126.945108]  bpf_prog_5c5e9b8ca18045f2_bpfsnoop_fgraph+0x12e/0x3f2
[  126.945210]  ? lock_release+0xc6/0x280
[  126.945305]  ? __bpf_prog_enter_recur+0x43/0x110
[  126.945411]  bpf_trampoline_6442519845+0x5e/0x133
[  126.945510]  hrtimer_start_range_ns+0x9/0x4b0
[  126.945603]  ? tick_nohz_restart_sched_tick+0x89/0xe0
[  126.945694]  tick_nohz_idle_exit+0xc9/0x110
[  126.945789]  do_idle+0x150/0x250
[  126.945890]  cpu_startup_entry+0x2d/0x30
[  126.945976]  start_secondary+0xfc/0x100
[  126.946069]  common_startup_64+0x12c/0x138
[  126.946197]  </TASK>

Full log: [2].

Additional information:
* Kernel version: 6.17.0-rc1-gcb708c11617a
* Config: [3].

Links:
[0] https://github.com/bpfsnoop/bpfsnoop
[1] https://github.com/danobi/vmtest
[2]
https://gist.githubusercontent.com/Asphaltt/88d11c49e62485f4d4f4a7664089c3cd/raw/f26c123c0ec5f3e5ac588844db51bbec0bb0f9c7/deadlock-crash.log
[3]
https://gist.githubusercontent.com/Asphaltt/88d11c49e62485f4d4f4a7664089c3cd/raw/f26c123c0ec5f3e5ac588844db51bbec0bb0f9c7/config

Thanks,
Leon


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature
  2025-08-27  2:13 [BUG] Deadlock triggered by bpfsnoop funcgraph feature Leon Hwang
@ 2025-08-27  2:23 ` Alexei Starovoitov
  2025-08-27  2:58   ` Leon Hwang
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2025-08-27  2:23 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jiri Olsa

On Tue, Aug 26, 2025 at 7:13 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> Hi,
>
> I’ve encountered a reproducible deadlock while developing the funcgraph
> feature for bpfsnoop [0].

debug it pls.
Sounds like you're implying that the root cause is in bpf,
but why do you think so?

You're attaching to things that shouldn't be attached to.
Like rcu_lockdep_current_cpu_online()
so effectively you're recursing in that lockdep code.
See big lock there. It will dead lock for sure.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature
  2025-08-27  2:23 ` Alexei Starovoitov
@ 2025-08-27  2:58   ` Leon Hwang
  2025-08-28  0:42     ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Leon Hwang @ 2025-08-27  2:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jiri Olsa



On 27/8/25 10:23, Alexei Starovoitov wrote:
> On Tue, Aug 26, 2025 at 7:13 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> Hi,
>>
>> I’ve encountered a reproducible deadlock while developing the funcgraph
>> feature for bpfsnoop [0].
> 
> debug it pls.

It’s quite difficult for me. I’ve tried debugging it but didn’t succeed.

> Sounds like you're implying that the root cause is in bpf,
> but why do you think so?
> 
> You're attaching to things that shouldn't be attached to.
> Like rcu_lockdep_current_cpu_online()
> so effectively you're recursing in that lockdep code.
> See big lock there. It will dead lock for sure.

If a function that acquires a lock can be traced by a tracing program,
bpfsnoop’s funcgraph will attempt to trace it as well. In such cases, a
deadlock is highly likely to occur.

With bpfsnoop I try my best to avoid such deadlock issues. But what
about other bpf tracing tools? If they don’t handle this properly, the
kernel is very likely to crash.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature
  2025-08-27  2:58   ` Leon Hwang
@ 2025-08-28  0:42     ` Alexei Starovoitov
  2025-08-28  2:40       ` Leon Hwang
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2025-08-28  0:42 UTC (permalink / raw)
  To: Leon Hwang, Paul E. McKenney
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jiri Olsa

On Tue, Aug 26, 2025 at 7:58 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 27/8/25 10:23, Alexei Starovoitov wrote:
> > On Tue, Aug 26, 2025 at 7:13 PM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >> Hi,
> >>
> >> I’ve encountered a reproducible deadlock while developing the funcgraph
> >> feature for bpfsnoop [0].
> >
> > debug it pls.
>
> It’s quite difficult for me. I’ve tried debugging it but didn’t succeed.
>
> > Sounds like you're implying that the root cause is in bpf,
> > but why do you think so?
> >
> > You're attaching to things that shouldn't be attached to.
> > Like rcu_lockdep_current_cpu_online()
> > so effectively you're recursing in that lockdep code.
> > See big lock there. It will dead lock for sure.
>
> If a function that acquires a lock can be traced by a tracing program,
> bpfsnoop’s funcgraph will attempt to trace it as well. In such cases, a
> deadlock is highly likely to occur.
>
> With bpfsnoop I try my best to avoid such deadlock issues. But what
> about other bpf tracing tools? If they don’t handle this properly, the
> kernel is very likely to crash.

bpf infra is trying hard not to crash it, but debug kernel is a different
category. rcu_read_lock_held() doesn't exist in production kernels.
You can propose adding "notrace" for it, but in general that doesn't scale.
Same with rcu_lockdep_current_cpu_online().
It probably deserves "notrace" too.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature
  2025-08-28  0:42     ` Alexei Starovoitov
@ 2025-08-28  2:40       ` Leon Hwang
  2025-08-28 11:50         ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Leon Hwang @ 2025-08-28  2:40 UTC (permalink / raw)
  To: Alexei Starovoitov, Paul E. McKenney
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jiri Olsa



On 28/8/25 08:42, Alexei Starovoitov wrote:
> On Tue, Aug 26, 2025 at 7:58 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>>
>> On 27/8/25 10:23, Alexei Starovoitov wrote:
>>> On Tue, Aug 26, 2025 at 7:13 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I’ve encountered a reproducible deadlock while developing the funcgraph
>>>> feature for bpfsnoop [0].
>>>
>>> debug it pls.
>>
>> It’s quite difficult for me. I’ve tried debugging it but didn’t succeed.
>>
>>> Sounds like you're implying that the root cause is in bpf,
>>> but why do you think so?
>>>
>>> You're attaching to things that shouldn't be attached to.
>>> Like rcu_lockdep_current_cpu_online()
>>> so effectively you're recursing in that lockdep code.
>>> See big lock there. It will dead lock for sure.
>>
>> If a function that acquires a lock can be traced by a tracing program,
>> bpfsnoop’s funcgraph will attempt to trace it as well. In such cases, a
>> deadlock is highly likely to occur.
>>
>> With bpfsnoop I try my best to avoid such deadlock issues. But what
>> about other bpf tracing tools? If they don’t handle this properly, the
>> kernel is very likely to crash.
> 
> bpf infra is trying hard not to crash it, but debug kernel is a different
> category. rcu_read_lock_held() doesn't exist in production kernels.
> You can propose adding "notrace" for it, but in general that doesn't scale.
> Same with rcu_lockdep_current_cpu_online().
> It probably deserves "notrace" too.

Indeed, it doesn't scale.

When I run
./bpfsnoop -k "htab_*_elem" --output-fgraph --fgraph-debug
--fgraph-exclude
'rcu_read_lock_*held,rcu_lockdep_current_cpu_online,*raw_spin_*lock*,kvfree,show_stack,put_task_stack',
the kernel doesn’t panic, but the OS eventually stalls and becomes
unresponsive to key presses.

It seems preferable to avoid running BPF programs continuously in such
cases.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature
  2025-08-28  2:40       ` Leon Hwang
@ 2025-08-28 11:50         ` Paul E. McKenney
  2025-08-28 13:39           ` Leon Hwang
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2025-08-28 11:50 UTC (permalink / raw)
  To: Leon Hwang
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jiri Olsa

On Thu, Aug 28, 2025 at 10:40:47AM +0800, Leon Hwang wrote:
> On 28/8/25 08:42, Alexei Starovoitov wrote:
> > On Tue, Aug 26, 2025 at 7:58 PM Leon Hwang <leon.hwang@linux.dev> wrote:
> >> On 27/8/25 10:23, Alexei Starovoitov wrote:
> >>> On Tue, Aug 26, 2025 at 7:13 PM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I’ve encountered a reproducible deadlock while developing the funcgraph
> >>>> feature for bpfsnoop [0].
> >>>
> >>> debug it pls.
> >>
> >> It’s quite difficult for me. I’ve tried debugging it but didn’t succeed.
> >>
> >>> Sounds like you're implying that the root cause is in bpf,
> >>> but why do you think so?
> >>>
> >>> You're attaching to things that shouldn't be attached to.
> >>> Like rcu_lockdep_current_cpu_online()
> >>> so effectively you're recursing in that lockdep code.
> >>> See big lock there. It will dead lock for sure.
> >>
> >> If a function that acquires a lock can be traced by a tracing program,
> >> bpfsnoop’s funcgraph will attempt to trace it as well. In such cases, a
> >> deadlock is highly likely to occur.
> >>
> >> With bpfsnoop I try my best to avoid such deadlock issues. But what
> >> about other bpf tracing tools? If they don’t handle this properly, the
> >> kernel is very likely to crash.
> > 
> > bpf infra is trying hard not to crash it, but debug kernel is a different
> > category. rcu_read_lock_held() doesn't exist in production kernels.
> > You can propose adding "notrace" for it, but in general that doesn't scale.
> > Same with rcu_lockdep_current_cpu_online().
> > It probably deserves "notrace" too.
> 
> Indeed, it doesn't scale.
> 
> When I run
> ./bpfsnoop -k "htab_*_elem" --output-fgraph --fgraph-debug
> --fgraph-exclude
> 'rcu_read_lock_*held,rcu_lockdep_current_cpu_online,*raw_spin_*lock*,kvfree,show_stack,put_task_stack',
> the kernel doesn’t panic, but the OS eventually stalls and becomes
> unresponsive to key presses.
> 
> It seems preferable to avoid running BPF programs continuously in such
> cases.

Agreed, when adding code to the Linux kernel, whether via a patch, via
a BPF program, or by whatever other means, you are taking responsibility
for the speed, scalability, and latency effects of that code.

Nevertheless, I am happy to add a few "notrace" modifiers
if needed.  Do you guys need them for rcu_read_lock_held() and
rcu_lockdep_current_cpu_online()?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature
  2025-08-28 11:50         ` Paul E. McKenney
@ 2025-08-28 13:39           ` Leon Hwang
  2025-08-28 16:43             ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Leon Hwang @ 2025-08-28 13:39 UTC (permalink / raw)
  To: paulmck
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jiri Olsa

On Thu Aug 28, 2025 at 7:50 PM +08, Paul E. McKenney wrote:
> On Thu, Aug 28, 2025 at 10:40:47AM +0800, Leon Hwang wrote:
>> On 28/8/25 08:42, Alexei Starovoitov wrote:
>> > On Tue, Aug 26, 2025 at 7:58 PM Leon Hwang <leon.hwang@linux.dev> wrote:

[...]

>> >
>> > bpf infra is trying hard not to crash it, but debug kernel is a different
>> > category. rcu_read_lock_held() doesn't exist in production kernels.
>> > You can propose adding "notrace" for it, but in general that doesn't scale.
>> > Same with rcu_lockdep_current_cpu_online().
>> > It probably deserves "notrace" too.
>>
>> Indeed, it doesn't scale.
>>
>> When I run
>> ./bpfsnoop -k "htab_*_elem" --output-fgraph --fgraph-debug
>> --fgraph-exclude
>> 'rcu_read_lock_*held,rcu_lockdep_current_cpu_online,*raw_spin_*lock*,kvfree,show_stack,put_task_stack',
>> the kernel doesn’t panic, but the OS eventually stalls and becomes
>> unresponsive to key presses.
>>
>> It seems preferable to avoid running BPF programs continuously in such
>> cases.
>
> Agreed, when adding code to the Linux kernel, whether via a patch, via
> a BPF program, or by whatever other means, you are taking responsibility
> for the speed, scalability, and latency effects of that code.
>
> Nevertheless, I am happy to add a few "notrace" modifiers
> if needed.  Do you guys need them for rcu_read_lock_held() and
> rcu_lockdep_current_cpu_online()?
>

I think it would be better to add "notrace" to following functions:

./bpfsnoop -k 'rcu_read_*lock_*held*,rcu_lockdep_*' --show-func-proto
bool rcu_lockdep_current_cpu_online(); [traceable]
int rcu_read_lock_any_held(); [traceable]
int rcu_read_lock_bh_held(); [traceable]
int rcu_read_lock_held(); [traceable]
int rcu_read_lock_sched_held(); [traceable]

Thanks,
Leon

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature
  2025-08-28 13:39           ` Leon Hwang
@ 2025-08-28 16:43             ` Alexei Starovoitov
  2025-08-28 17:24               ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2025-08-28 16:43 UTC (permalink / raw)
  To: Leon Hwang
  Cc: Paul E. McKenney, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jiri Olsa

On Thu, Aug 28, 2025 at 6:39 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> On Thu Aug 28, 2025 at 7:50 PM +08, Paul E. McKenney wrote:
> > On Thu, Aug 28, 2025 at 10:40:47AM +0800, Leon Hwang wrote:
> >> On 28/8/25 08:42, Alexei Starovoitov wrote:
> >> > On Tue, Aug 26, 2025 at 7:58 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> [...]
>
> >> >
> >> > bpf infra is trying hard not to crash it, but debug kernel is a different
> >> > category. rcu_read_lock_held() doesn't exist in production kernels.
> >> > You can propose adding "notrace" for it, but in general that doesn't scale.
> >> > Same with rcu_lockdep_current_cpu_online().
> >> > It probably deserves "notrace" too.
> >>
> >> Indeed, it doesn't scale.
> >>
> >> When I run
> >> ./bpfsnoop -k "htab_*_elem" --output-fgraph --fgraph-debug
> >> --fgraph-exclude
> >> 'rcu_read_lock_*held,rcu_lockdep_current_cpu_online,*raw_spin_*lock*,kvfree,show_stack,put_task_stack',
> >> the kernel doesn’t panic, but the OS eventually stalls and becomes
> >> unresponsive to key presses.
> >>
> >> It seems preferable to avoid running BPF programs continuously in such
> >> cases.
> >
> > Agreed, when adding code to the Linux kernel, whether via a patch, via
> > a BPF program, or by whatever other means, you are taking responsibility
> > for the speed, scalability, and latency effects of that code.
> >
> > Nevertheless, I am happy to add a few "notrace" modifiers
> > if needed.  Do you guys need them for rcu_read_lock_held() and
> > rcu_lockdep_current_cpu_online()?
> >
>
> I think it would be better to add "notrace" to following functions:
>
> ./bpfsnoop -k 'rcu_read_*lock_*held*,rcu_lockdep_*' --show-func-proto
> bool rcu_lockdep_current_cpu_online(); [traceable]
> int rcu_read_lock_any_held(); [traceable]
> int rcu_read_lock_bh_held(); [traceable]
> int rcu_read_lock_held(); [traceable]
> int rcu_read_lock_sched_held(); [traceable]

Agree. Seems like an easy way to remove a footgun.

Independently it would be good to make noinstr/notrace to include __cpuidle
functions. I think right now it's allowed to attach to default_idle()
which is causing issues.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature
  2025-08-28 16:43             ` Alexei Starovoitov
@ 2025-08-28 17:24               ` Paul E. McKenney
  2025-08-29  2:21                 ` Leon Hwang
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2025-08-28 17:24 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Leon Hwang, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jiri Olsa

On Thu, Aug 28, 2025 at 09:43:00AM -0700, Alexei Starovoitov wrote:
> On Thu, Aug 28, 2025 at 6:39 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >
> > On Thu Aug 28, 2025 at 7:50 PM +08, Paul E. McKenney wrote:
> > > On Thu, Aug 28, 2025 at 10:40:47AM +0800, Leon Hwang wrote:
> > >> On 28/8/25 08:42, Alexei Starovoitov wrote:
> > >> > On Tue, Aug 26, 2025 at 7:58 PM Leon Hwang <leon.hwang@linux.dev> wrote:
> >
> > [...]
> >
> > >> >
> > >> > bpf infra is trying hard not to crash it, but debug kernel is a different
> > >> > category. rcu_read_lock_held() doesn't exist in production kernels.
> > >> > You can propose adding "notrace" for it, but in general that doesn't scale.
> > >> > Same with rcu_lockdep_current_cpu_online().
> > >> > It probably deserves "notrace" too.
> > >>
> > >> Indeed, it doesn't scale.
> > >>
> > >> When I run
> > >> ./bpfsnoop -k "htab_*_elem" --output-fgraph --fgraph-debug
> > >> --fgraph-exclude
> > >> 'rcu_read_lock_*held,rcu_lockdep_current_cpu_online,*raw_spin_*lock*,kvfree,show_stack,put_task_stack',
> > >> the kernel doesn’t panic, but the OS eventually stalls and becomes
> > >> unresponsive to key presses.
> > >>
> > >> It seems preferable to avoid running BPF programs continuously in such
> > >> cases.
> > >
> > > Agreed, when adding code to the Linux kernel, whether via a patch, via
> > > a BPF program, or by whatever other means, you are taking responsibility
> > > for the speed, scalability, and latency effects of that code.
> > >
> > > Nevertheless, I am happy to add a few "notrace" modifiers
> > > if needed.  Do you guys need them for rcu_read_lock_held() and
> > > rcu_lockdep_current_cpu_online()?
> > >
> >
> > I think it would be better to add "notrace" to following functions:
> >
> > ./bpfsnoop -k 'rcu_read_*lock_*held*,rcu_lockdep_*' --show-func-proto
> > bool rcu_lockdep_current_cpu_online(); [traceable]
> > int rcu_read_lock_any_held(); [traceable]
> > int rcu_read_lock_bh_held(); [traceable]
> > int rcu_read_lock_held(); [traceable]
> > int rcu_read_lock_sched_held(); [traceable]
> 
> Agree. Seems like an easy way to remove a footgun.

Very good, and please see below.  This might or might not make the next
merge window, but if not, it should be good for the one after that.

> Independently it would be good to make noinstr/notrace to include __cpuidle
> functions. I think right now it's allowed to attach to default_idle()
> which is causing issues.

Leon, would you be interested in putting together a patch for these?

							Thanx, Paul

------------------------------------------------------------------------

commit dada60c8851f19e54524cc1bcf8ab5938eb909c9
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu Aug 28 10:17:10 2025 -0700

    rcu: Mark diagnostic functions as notrace
    
    The rcu_lockdep_current_cpu_online(), rcu_read_lock_sched_held(),
    rcu_read_lock_held(), rcu_read_lock_bh_held(), rcu_read_lock_any_held()
    are used by tracing-related code paths, so putting traces on them is
    unlikely to make anyone happy.  This commit therefore marks them all
    "notrace".
    
    Reported-by: Leon Hwang <leon.hwang@linux.dev>
    Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1291e0761d70ab..2515ee9a82df4f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4021,7 +4021,7 @@ bool rcu_cpu_online(int cpu)
  * RCU on an offline processor during initial boot, hence the check for
  * rcu_scheduler_fully_active.
  */
-bool rcu_lockdep_current_cpu_online(void)
+bool notrace rcu_lockdep_current_cpu_online(void)
 {
 	struct rcu_data *rdp;
 	bool ret = false;
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index c912b594ba987f..dfeba9b3539508 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -117,7 +117,7 @@ static bool rcu_read_lock_held_common(bool *ret)
 	return false;
 }
 
-int rcu_read_lock_sched_held(void)
+int notrace rcu_read_lock_sched_held(void)
 {
 	bool ret;
 
@@ -342,7 +342,7 @@ EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled);
  * Note that rcu_read_lock() is disallowed if the CPU is either idle or
  * offline from an RCU perspective, so check for those as well.
  */
-int rcu_read_lock_held(void)
+int notrace rcu_read_lock_held(void)
 {
 	bool ret;
 
@@ -367,7 +367,7 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_held);
  * Note that rcu_read_lock_bh() is disallowed if the CPU is either idle or
  * offline from an RCU perspective, so check for those as well.
  */
-int rcu_read_lock_bh_held(void)
+int notrace rcu_read_lock_bh_held(void)
 {
 	bool ret;
 
@@ -377,7 +377,7 @@ int rcu_read_lock_bh_held(void)
 }
 EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
 
-int rcu_read_lock_any_held(void)
+int notrace rcu_read_lock_any_held(void)
 {
 	bool ret;
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature
  2025-08-28 17:24               ` Paul E. McKenney
@ 2025-08-29  2:21                 ` Leon Hwang
  2025-08-29 18:08                   ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Leon Hwang @ 2025-08-29  2:21 UTC (permalink / raw)
  To: paulmck, Alexei Starovoitov
  Cc: Leon Hwang, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jiri Olsa



On 29/8/25 01:24, Paul E. McKenney wrote:
> On Thu, Aug 28, 2025 at 09:43:00AM -0700, Alexei Starovoitov wrote:
>> On Thu, Aug 28, 2025 at 6:39 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>>
>>> On Thu Aug 28, 2025 at 7:50 PM +08, Paul E. McKenney wrote:

[...]

>>>
>>> I think it would be better to add "notrace" to following functions:
>>>
>>> ./bpfsnoop -k 'rcu_read_*lock_*held*,rcu_lockdep_*' --show-func-proto
>>> bool rcu_lockdep_current_cpu_online(); [traceable]
>>> int rcu_read_lock_any_held(); [traceable]
>>> int rcu_read_lock_bh_held(); [traceable]
>>> int rcu_read_lock_held(); [traceable]
>>> int rcu_read_lock_sched_held(); [traceable]
>>
>> Agree. Seems like an easy way to remove a footgun.
> 
> Very good, and please see below.  This might or might not make the next
> merge window, but if not, it should be good for the one after that.
> 
>> Independently it would be good to make noinstr/notrace to include __cpuidle
>> functions. I think right now it's allowed to attach to default_idle()
>> which is causing issues.

Nope.
./bpfsnoop -k 'default_idle' -k '__cpuidle' --show-func-proto
void default_idle();

Here are the traced functions of
./bpfsnoop -D -k "htab_*_elem" --output-fgraph --fgraph-debug
--fgraph-exclude
'rcu_read_lock_*held,rcu_lockdep_current_cpu_online,*raw_spin_*lock*,kvfree,show_stack,put_task_stack':

htab_lru_map_delete_elem
htab_lru_map_lookup_and_delete_elem
htab_lru_map_lookup_elem
htab_lru_map_update_elem
htab_lru_percpu_map_lookup_and_delete_elem
htab_lru_percpu_map_lookup_elem
htab_lru_percpu_map_lookup_percpu_elem
htab_lru_percpu_map_update_elem
htab_map_delete_elem
htab_map_lookup_and_delete_elem
htab_map_lookup_elem
htab_map_seq_show_elem
htab_map_update_elem
htab_of_map_lookup_elem
htab_percpu_map_lookup_and_delete_elem
htab_percpu_map_lookup_elem
htab_percpu_map_lookup_percpu_elem
htab_percpu_map_seq_show_elem
htab_percpu_map_update_elem
add_taint
alloc_htab_elem
arch_scale_cpu_capacity
arch_stack_walk
bpf_list_head_free
__bpf_obj_drop_impl
bpf_obj_free_fields
__bpf_prog_put
bpf_prog_put
bpf_rb_root_free
bpf_timer_cancel_and_free
bpf_wq_cancel_and_free
btf_find_struct_meta
btf_is_kernel
btf_type_seq_show
btf_type_seq_show_flags
btf_type_show
check_and_free_fields
check_panic_on_warn
check_timeout
clear_pending_if_disabled
console_verbose
copy_map_value_locked
ctx_sched_out
__delayacct_blkio_start
__do_set_cpus_allowed
do_trace_write_msr
enter_lazy_tlb
__folio_put
__folio_unqueue_deferred_split
free_frozen_pages
free_htab_elem
free_huge_folio
get_free_pages_noprof
get_state_synchronize_rcu
get_state_synchronize_rcu_full
get_taint
hrtick_start_fair
hrtimer_active
hrtimer_cancel
hrtimer_setup
hrtimer_start_range_ns
__htab_lru_percpu_map_update_elem
__htab_map_lookup_and_delete_elem
__htab_map_lookup_elem
htab_map_update_elem_in_place
invalidate_user_asid
__is_kernel_percpu_address
is_kernel_percpu_address
is_module_address
__is_module_percpu_address
is_module_percpu_address
is_vmalloc_addr
kallsyms_lookup
kexec_crash_loaded
kvfree_call_rcu
local_clock
__mem_cgroup_uncharge
__might_sleep
mm_needs_global_asid
mod_node_page_state
__msecs_to_jiffies
nbcon_cpu_emergency_enter
nbcon_cpu_emergency_exit
nbcon_get_cpu_emergency_nesting
pcpu_copy_value
perf_cgroup_switch
perf_clear_dirty_counters
perf_ctx_disable
perf_ctx_enable
perf_ctx_sched_task_cb
__perf_event_task_sched_out
perf_event_update_time
perf_event_update_userpage
perf_exclude_event
perf_pmu_sched_task
___perf_sw_event
perf_swevent_event
pick_next_task_fair
pick_task_fair
pick_task_fair
pick_task_idle
preempt_model_str
__printk_cpu_sync_put
__printk_cpu_sync_try_get
__printk_cpu_sync_wait
printk_percpu_data_ready
print_modules
print_stop_info
print_tainted
print_tainted_verbose
print_worker_info
profile_hits
put_prev_entity
queue_delayed_work_on
queued_spin_lock_slowpath
__queue_work
queue_work_on
rcu_note_context_switch
rcu_qs
rcu_report_exp_cpu_mult
rcu_tasks_trace_qs_blkd
rcu_trc_cmpxchg_need_qs
resilient_queued_spin_lock_slowpath
sched_balance_newidle
__schedule_bug
__schedule_delayed_monitor_work
seq_putc
seq_write
set_next_entity
stack_trace_print
stack_trace_save
switch_ldt
switch_mm_irqs_off
synchronize_rcu
task_h_load
task_work_add
this_cpu_in_panic
touch_all_softlockup_watchdogs
__traceiter_contention_begin
__traceiter_contention_end
__traceiter_lock_acquire
__traceiter_lock_release
__traceiter_rcu_preempt_task
__traceiter_sched_entry_tp
__traceiter_sched_exit_tp
__traceiter_tlb_flush
unpin_user_page
__update_context_time
update_rq_clock

> 
> Leon, would you be interested in putting together a patch for these?
> 

I think we can apply this patch first.

If other patch is required, I would like to send it later.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature
  2025-08-29  2:21                 ` Leon Hwang
@ 2025-08-29 18:08                   ` Alexei Starovoitov
  2025-09-01  2:38                     ` Leon Hwang
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2025-08-29 18:08 UTC (permalink / raw)
  To: Leon Hwang, Jiri Olsa
  Cc: Paul E. McKenney, Leon Hwang, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann

On Thu, Aug 28, 2025 at 7:21 PM Leon Hwang <hffilwlqm@gmail.com> wrote:
>
>
> >
> >> Independently it would be good to make noinstr/notrace to include __cpuidle
> >> functions. I think right now it's allowed to attach to default_idle()
> >> which is causing issues.
>
> Nope.
> ./bpfsnoop -k 'default_idle' -k '__cpuidle' --show-func-proto
> void default_idle();

hmm. indeed.

Jiri,

why do we still have this in selftest ?

static bool skip_entry(char *name)
{
        /*
         * We attach to almost all kernel functions and some of them
         * will cause 'suspicious RCU usage' when fprobe is attached
         * to them. Filter out the current culprits - arch_cpu_idle
         * default_idle and rcu_* functions.
         */
        if (!strcmp(name, "arch_cpu_idle"))
                return true;
        if (!strcmp(name, "default_idle"))
                return true;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature
  2025-08-29 18:08                   ` Alexei Starovoitov
@ 2025-09-01  2:38                     ` Leon Hwang
  0 siblings, 0 replies; 12+ messages in thread
From: Leon Hwang @ 2025-09-01  2:38 UTC (permalink / raw)
  To: Alexei Starovoitov, Leon Hwang, Jiri Olsa
  Cc: Paul E. McKenney, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann



On 30/8/25 02:08, Alexei Starovoitov wrote:
> On Thu, Aug 28, 2025 at 7:21 PM Leon Hwang <hffilwlqm@gmail.com> wrote:
>>
>>
>>>
>>>> Independently it would be good to make noinstr/notrace to include __cpuidle
>>>> functions. I think right now it's allowed to attach to default_idle()
>>>> which is causing issues.
>>
>> Nope.
>> ./bpfsnoop -k 'default_idle' -k '__cpuidle' --show-func-proto
>> void default_idle();
> 
> hmm. indeed.
> 
> Jiri,
> 
> why do we still have this in selftest ?
> 
> static bool skip_entry(char *name)
> {
>         /*
>          * We attach to almost all kernel functions and some of them
>          * will cause 'suspicious RCU usage' when fprobe is attached
>          * to them. Filter out the current culprits - arch_cpu_idle
>          * default_idle and rcu_* functions.
>          */
>         if (!strcmp(name, "arch_cpu_idle"))
>                 return true;
>         if (!strcmp(name, "default_idle"))
>                 return true;

After removing "arch_cpu_idle" and "default_idle", it's OK to run the
selftest:

./test_progs -t kprobe_multi_test
#155/1   kprobe_multi_test/skel_api:OK
#155/2   kprobe_multi_test/link_api_addrs:OK
#155/3   kprobe_multi_test/link_api_syms:OK
#155/4   kprobe_multi_test/attach_api_pattern:OK
#155/5   kprobe_multi_test/attach_api_addrs:OK
#155/6   kprobe_multi_test/attach_api_syms:OK
#155/7   kprobe_multi_test/attach_api_fails:OK
#155/8   kprobe_multi_test/attach_override:OK
#155/9   kprobe_multi_test/session:OK
#155/10  kprobe_multi_test/session_cookie:OK
#155/11  kprobe_multi_test/unique_match:OK
#155/12  kprobe_multi_test/kprobe_session_return_0:OK
#155/13  kprobe_multi_test/kprobe_session_return_1:OK
#155/14  kprobe_multi_test/kprobe_session_return_2:OK
#155     kprobe_multi_test:OK
#156/1   kprobe_multi_testmod_test/testmod_attach_api_syms:OK
#156/2   kprobe_multi_testmod_test/testmod_attach_api_addrs:OK
#156     kprobe_multi_testmod_test:OK
Summary: 2/16 PASSED, 0 SKIPPED, 0 FAILED

Thanks,
Leon


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-09-01  2:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27  2:13 [BUG] Deadlock triggered by bpfsnoop funcgraph feature Leon Hwang
2025-08-27  2:23 ` Alexei Starovoitov
2025-08-27  2:58   ` Leon Hwang
2025-08-28  0:42     ` Alexei Starovoitov
2025-08-28  2:40       ` Leon Hwang
2025-08-28 11:50         ` Paul E. McKenney
2025-08-28 13:39           ` Leon Hwang
2025-08-28 16:43             ` Alexei Starovoitov
2025-08-28 17:24               ` Paul E. McKenney
2025-08-29  2:21                 ` Leon Hwang
2025-08-29 18:08                   ` Alexei Starovoitov
2025-09-01  2:38                     ` Leon Hwang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).