* [BUG] Deadlock triggered by bpfsnoop funcgraph feature @ 2025-08-27 2:13 Leon Hwang 2025-08-27 2:23 ` Alexei Starovoitov 0 siblings, 1 reply; 12+ messages in thread From: Leon Hwang @ 2025-08-27 2:13 UTC (permalink / raw) To: bpf; +Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa Hi, I’ve encountered a reproducible deadlock while developing the funcgraph feature for bpfsnoop [0]. Even on the latest bpf-next_base commit 2465bb83e0b4 ("Merge branch 's390-bpf-add-s390-jit-support-for-timed-may_goto'"), the issue still persists. Reproduction: 1. Build bpfsnoop with Go 1.24 and LLVM 20. 2. Start a VM using vmtest [1]. 3. Trigger the deadlock with: './bpfsnoop -k "htab_*_elem" --output-fgraph --fgraph-debug' Logs: [ 126.934205] watchdog: CPU1: Watchdog detected hard LOCKUP on cpu 1 [ 126.934406] Modules linked in: [ 126.934713] irq event stamp: 283284 [ 126.934806] hardirqs last enabled at (283283): [<ffffffffa7fa89f8>] default_idle_call+0xb8/0x1d0 [ 126.934925] hardirqs last disabled at (283284): [<ffffffffa73ac21f>] tick_nohz_idle_exit+0x8f/0x110 [ 126.935026] softirqs last enabled at (283262): [<ffffffffa72a4a06>] __irq_exit_rcu+0xa6/0xd0 [ 126.935124] softirqs last disabled at (283255): [<ffffffffa72a4a06>] __irq_exit_rcu+0xa6/0xd0 [ 126.935518] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.17.0-rc1-gcb708c11617a #23 PREEMPT(full) [ 126.935662] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 126.935865] RIP: 0010:__lock_acquire+0x30f/0x2590 [ 126.935973] Code: 89 f8 45 89 f7 49 89 de 4c 89 e3 41 89 cc 48 89 c1 eb 3e 48 8d 04 80 48 8d 04 80 48 8d 34 c5 40 78 87 a9 0f b6 86 c4 00 00 00 <84> c0 74 12 41 38 c0 44 0f 47 c0 80 be c6 00 00 00 02 44 0f 44 c0 [ 126.936062] RSP: 0018:ffffad20800ab008 EFLAGS: 00000007 [ 126.936219] RAX: 0000000000000003 RBX: ffff97af803f2d18 RCX: 0000000000000000 [ 126.936308] RDX: 0000000000000001 RSI: ffffffffa9877c28 RDI: 0000000000000007 [ 126.936394] RBP: ffffad20800ab080 R08: 0000000000000003 R09: 0000000000000005 [ 126.936480] R10: 0000000000000000 R11: 0000000000000007 R12: 0000000000000003 [ 126.936566] R13: ffff97af803f2240 R14: ffff97af803f2db8 R15: 0000000000000000 [ 126.936655] FS: 0000000000000000(0000) GS:ffff97b0126d8000(0000) knlGS:0000000000000000 [ 126.936744] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 126.936830] CR2: 000000c0285a3000 CR3: 0000000102c62004 CR4: 0000000000770ef0 [ 126.936918] PKRU: 55555554 [ 126.937038] Call Trace: [ 126.937133] <TASK> [ 126.937222] ? __lock_acquire+0x43d/0x2590 [ 126.937620] lock_acquire+0xb1/0x2c0 [ 126.937706] ? __bpf_prog_enter_recur+0x2a/0x110 [ 126.937826] ? lock_release+0xc6/0x280 [ 126.937910] ? lock_release+0xc6/0x280 [ 126.938006] __bpf_prog_enter_recur+0x3e/0x110 [ 126.938090] ? __bpf_prog_enter_recur+0x2a/0x110 [ 126.938204] bpf_trampoline_6442539790+0x88/0x110 [ 126.938301] rcu_lockdep_current_cpu_online+0x9/0x70 [ 126.938392] ? rcu_read_lock_held+0x31/0x60 [ 126.938501] bpf_trampoline_6442539812+0x66/0x110 [ 126.938594] rcu_read_lock_held+0x9/0x60 [ 126.938678] ? __htab_map_lookup_elem+0x25/0xf0 [ 126.938798] bpf_trampoline_6442491246+0x79/0x123 [ 126.938894] __htab_map_lookup_elem+0x9/0xf0 [ 126.938991] ? bpf_prog_243665d136749c2c_bpfsnoop_fgraph_tailcallee+0x129/0x14a [ 126.939080] ? __htab_map_lookup_elem+0x9/0xf0 [ 126.939182] bpf_prog_1d471894f1fc624c_bpfsnoop_fgraph+0x12e/0x3e8 [ 126.939285] ? lock_release+0xc6/0x280 [ 126.939381] ? __bpf_prog_enter_recur+0x43/0x110 [ 126.939473] bpf_trampoline_6442539790+0x4b/0x110 [ 126.939566] rcu_lockdep_current_cpu_online+0x9/0x70 [ 126.939649] ? rcu_read_lock_held+0x31/0x60 [ 126.939737] bpf_trampoline_6442539812+0x66/0x110 [ 126.939829] rcu_read_lock_held+0x9/0x60 [ 126.939913] ? __htab_map_lookup_elem+0x25/0xf0 [ 126.940010] bpf_trampoline_6442491246+0x79/0x123 [ 126.940105] __htab_map_lookup_elem+0x9/0xf0 [ 126.940212] ? bpf_prog_243665d136749c2c_bpfsnoop_fgraph_tailcallee+0x129/0x14a [ 126.940300] ? rcu_lockdep_current_cpu_online+0x9/0x70 [ 126.940402] bpf_prog_1ed83077283e3ded_bpfsnoop_fgraph+0x12e/0x423 [ 126.940517] ? __bpf_prog_enter_recur+0x43/0x110 [ 126.940609] bpf_trampoline_6442491246+0xac/0x123 [ 126.940705] __htab_map_lookup_elem+0x9/0xf0 [ 126.940796] ? bpf_prog_243665d136749c2c_bpfsnoop_fgraph_tailcallee+0x95/0x14a [ 126.940895] ? bpf_prog_1d471894f1fc624c_bpfsnoop_fgraph+0x12e/0x3e8 [ 126.940980] ? bpf_prog_1d471894f1fc624c_bpfsnoop_fgraph+0x12e/0x3e8 [ 126.941080] bpf_prog_8c9f4824b35e5d8e_bpfsnoop_fgraph+0x12e/0x423 [ 126.941181] ? lock_release+0xc6/0x280 [ 126.941265] ? lock_release+0xc6/0x280 [ 126.941360] ? __bpf_prog_enter_recur+0x43/0x110 [ 126.941452] bpf_trampoline_6442539790+0x99/0x110 [ 126.941544] rcu_lockdep_current_cpu_online+0x9/0x70 [ 126.941627] ? rcu_read_lock_held+0x31/0x60 [ 126.941715] bpf_trampoline_6442539812+0x66/0x110 [ 126.941813] rcu_read_lock_held+0x9/0x60 [ 126.941896] ? __htab_map_lookup_elem+0x25/0xf0 [ 126.941993] bpf_trampoline_6442491246+0x79/0x123 [ 126.942089] __htab_map_lookup_elem+0x9/0xf0 [ 126.942186] ? bpf_prog_243665d136749c2c_bpfsnoop_fgraph_tailcallee+0x95/0x14a [ 126.942276] ? bpf_trampoline_6442539812+0x4b/0x110 [ 126.942360] ? bpf_trampoline_6442539812+0x4b/0x110 [ 126.942446] bpf_prog_8c9f4824b35e5d8e_bpfsnoop_fgraph+0x12e/0x423 [ 126.942552] ? __bpf_tramp_exit+0x72/0x130 [ 126.942647] ? __bpf_prog_enter_recur+0x43/0x110 [ 126.942739] bpf_trampoline_6442539812+0x99/0x110 [ 126.942832] rcu_read_lock_held+0x9/0x60 [ 126.942915] ? __htab_map_lookup_elem+0x25/0xf0 [ 126.943012] bpf_trampoline_6442491246+0x79/0x123 [ 126.943108] __htab_map_lookup_elem+0x9/0xf0 [ 126.943209] ? bpf_prog_243665d136749c2c_bpfsnoop_fgraph_tailcallee+0x95/0x14a [ 126.943299] ? bpf_trampoline_6442491246+0x79/0x123 [ 126.943383] ? bpf_trampoline_6442491246+0x79/0x123 [ 126.943469] bpf_prog_1d471894f1fc624c_bpfsnoop_fgraph+0x12e/0x3e8 [ 126.943571] ? lock_release+0xc6/0x280 [ 126.943666] ? __bpf_prog_enter_recur+0x43/0x110 [ 126.943758] bpf_trampoline_6442539812+0x4b/0x110 [ 126.943851] rcu_read_lock_held+0x9/0x60 [ 126.943934] ? __htab_map_lookup_elem+0x25/0xf0 [ 126.944031] bpf_trampoline_6442491246+0x79/0x123 [ 126.944126] __htab_map_lookup_elem+0x9/0xf0 [ 126.944235] ? bpf_prog_243665d136749c2c_bpfsnoop_fgraph_tailcallee+0x129/0x14a [ 126.944345] ? bpf_prog_5c5e9b8ca18045f2_bpfsnoop_fgraph+0x12e/0x3f2 [ 126.944442] bpf_prog_5c5b59f2388bb72a_bpfsnoop_fgraph+0x12e/0x3f2 [ 126.944545] ? lock_release+0xc6/0x280 [ 126.944640] ? __bpf_prog_enter_recur+0x43/0x110 [ 126.944732] bpf_trampoline_6442491246+0x56/0x123 [ 126.944828] __htab_map_lookup_elem+0x9/0xf0 [ 126.944931] ? bpf_prog_243665d136749c2c_bpfsnoop_fgraph_tailcallee+0x129/0x14a [ 126.945019] ? tick_nohz_idle_exit+0xc9/0x110 [ 126.945108] bpf_prog_5c5e9b8ca18045f2_bpfsnoop_fgraph+0x12e/0x3f2 [ 126.945210] ? lock_release+0xc6/0x280 [ 126.945305] ? __bpf_prog_enter_recur+0x43/0x110 [ 126.945411] bpf_trampoline_6442519845+0x5e/0x133 [ 126.945510] hrtimer_start_range_ns+0x9/0x4b0 [ 126.945603] ? tick_nohz_restart_sched_tick+0x89/0xe0 [ 126.945694] tick_nohz_idle_exit+0xc9/0x110 [ 126.945789] do_idle+0x150/0x250 [ 126.945890] cpu_startup_entry+0x2d/0x30 [ 126.945976] start_secondary+0xfc/0x100 [ 126.946069] common_startup_64+0x12c/0x138 [ 126.946197] </TASK> Full log: [2]. Additional information: * Kernel version: 6.17.0-rc1-gcb708c11617a * Config: [3]. Links: [0] https://github.com/bpfsnoop/bpfsnoop [1] https://github.com/danobi/vmtest [2] https://gist.githubusercontent.com/Asphaltt/88d11c49e62485f4d4f4a7664089c3cd/raw/f26c123c0ec5f3e5ac588844db51bbec0bb0f9c7/deadlock-crash.log [3] https://gist.githubusercontent.com/Asphaltt/88d11c49e62485f4d4f4a7664089c3cd/raw/f26c123c0ec5f3e5ac588844db51bbec0bb0f9c7/config Thanks, Leon ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature 2025-08-27 2:13 [BUG] Deadlock triggered by bpfsnoop funcgraph feature Leon Hwang @ 2025-08-27 2:23 ` Alexei Starovoitov 2025-08-27 2:58 ` Leon Hwang 0 siblings, 1 reply; 12+ messages in thread From: Alexei Starovoitov @ 2025-08-27 2:23 UTC (permalink / raw) To: Leon Hwang Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa On Tue, Aug 26, 2025 at 7:13 PM Leon Hwang <leon.hwang@linux.dev> wrote: > > Hi, > > I’ve encountered a reproducible deadlock while developing the funcgraph > feature for bpfsnoop [0]. debug it pls. Sounds like you're implying that the root cause is in bpf, but why do you think so? You're attaching to things that shouldn't be attached to. Like rcu_lockdep_current_cpu_online() so effectively you're recursing in that lockdep code. See big lock there. It will dead lock for sure. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature 2025-08-27 2:23 ` Alexei Starovoitov @ 2025-08-27 2:58 ` Leon Hwang 2025-08-28 0:42 ` Alexei Starovoitov 0 siblings, 1 reply; 12+ messages in thread From: Leon Hwang @ 2025-08-27 2:58 UTC (permalink / raw) To: Alexei Starovoitov Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa On 27/8/25 10:23, Alexei Starovoitov wrote: > On Tue, Aug 26, 2025 at 7:13 PM Leon Hwang <leon.hwang@linux.dev> wrote: >> >> Hi, >> >> I’ve encountered a reproducible deadlock while developing the funcgraph >> feature for bpfsnoop [0]. > > debug it pls. It’s quite difficult for me. I’ve tried debugging it but didn’t succeed. > Sounds like you're implying that the root cause is in bpf, > but why do you think so? > > You're attaching to things that shouldn't be attached to. > Like rcu_lockdep_current_cpu_online() > so effectively you're recursing in that lockdep code. > See big lock there. It will dead lock for sure. If a function that acquires a lock can be traced by a tracing program, bpfsnoop’s funcgraph will attempt to trace it as well. In such cases, a deadlock is highly likely to occur. With bpfsnoop I try my best to avoid such deadlock issues. But what about other bpf tracing tools? If they don’t handle this properly, the kernel is very likely to crash. Thanks, Leon ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature 2025-08-27 2:58 ` Leon Hwang @ 2025-08-28 0:42 ` Alexei Starovoitov 2025-08-28 2:40 ` Leon Hwang 0 siblings, 1 reply; 12+ messages in thread From: Alexei Starovoitov @ 2025-08-28 0:42 UTC (permalink / raw) To: Leon Hwang, Paul E. McKenney Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa On Tue, Aug 26, 2025 at 7:58 PM Leon Hwang <leon.hwang@linux.dev> wrote: > > > > On 27/8/25 10:23, Alexei Starovoitov wrote: > > On Tue, Aug 26, 2025 at 7:13 PM Leon Hwang <leon.hwang@linux.dev> wrote: > >> > >> Hi, > >> > >> I’ve encountered a reproducible deadlock while developing the funcgraph > >> feature for bpfsnoop [0]. > > > > debug it pls. > > It’s quite difficult for me. I’ve tried debugging it but didn’t succeed. > > > Sounds like you're implying that the root cause is in bpf, > > but why do you think so? > > > > You're attaching to things that shouldn't be attached to. > > Like rcu_lockdep_current_cpu_online() > > so effectively you're recursing in that lockdep code. > > See big lock there. It will dead lock for sure. > > If a function that acquires a lock can be traced by a tracing program, > bpfsnoop’s funcgraph will attempt to trace it as well. In such cases, a > deadlock is highly likely to occur. > > With bpfsnoop I try my best to avoid such deadlock issues. But what > about other bpf tracing tools? If they don’t handle this properly, the > kernel is very likely to crash. bpf infra is trying hard not to crash it, but debug kernel is a different category. rcu_read_lock_held() doesn't exist in production kernels. You can propose adding "notrace" for it, but in general that doesn't scale. Same with rcu_lockdep_current_cpu_online(). It probably deserves "notrace" too. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature 2025-08-28 0:42 ` Alexei Starovoitov @ 2025-08-28 2:40 ` Leon Hwang 2025-08-28 11:50 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Leon Hwang @ 2025-08-28 2:40 UTC (permalink / raw) To: Alexei Starovoitov, Paul E. McKenney Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa On 28/8/25 08:42, Alexei Starovoitov wrote: > On Tue, Aug 26, 2025 at 7:58 PM Leon Hwang <leon.hwang@linux.dev> wrote: >> >> >> >> On 27/8/25 10:23, Alexei Starovoitov wrote: >>> On Tue, Aug 26, 2025 at 7:13 PM Leon Hwang <leon.hwang@linux.dev> wrote: >>>> >>>> Hi, >>>> >>>> I’ve encountered a reproducible deadlock while developing the funcgraph >>>> feature for bpfsnoop [0]. >>> >>> debug it pls. >> >> It’s quite difficult for me. I’ve tried debugging it but didn’t succeed. >> >>> Sounds like you're implying that the root cause is in bpf, >>> but why do you think so? >>> >>> You're attaching to things that shouldn't be attached to. >>> Like rcu_lockdep_current_cpu_online() >>> so effectively you're recursing in that lockdep code. >>> See big lock there. It will dead lock for sure. >> >> If a function that acquires a lock can be traced by a tracing program, >> bpfsnoop’s funcgraph will attempt to trace it as well. In such cases, a >> deadlock is highly likely to occur. >> >> With bpfsnoop I try my best to avoid such deadlock issues. But what >> about other bpf tracing tools? If they don’t handle this properly, the >> kernel is very likely to crash. > > bpf infra is trying hard not to crash it, but debug kernel is a different > category. rcu_read_lock_held() doesn't exist in production kernels. > You can propose adding "notrace" for it, but in general that doesn't scale. > Same with rcu_lockdep_current_cpu_online(). > It probably deserves "notrace" too. Indeed, it doesn't scale. When I run ./bpfsnoop -k "htab_*_elem" --output-fgraph --fgraph-debug --fgraph-exclude 'rcu_read_lock_*held,rcu_lockdep_current_cpu_online,*raw_spin_*lock*,kvfree,show_stack,put_task_stack', the kernel doesn’t panic, but the OS eventually stalls and becomes unresponsive to key presses. It seems preferable to avoid running BPF programs continuously in such cases. Thanks, Leon ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature 2025-08-28 2:40 ` Leon Hwang @ 2025-08-28 11:50 ` Paul E. McKenney 2025-08-28 13:39 ` Leon Hwang 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2025-08-28 11:50 UTC (permalink / raw) To: Leon Hwang Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa On Thu, Aug 28, 2025 at 10:40:47AM +0800, Leon Hwang wrote: > On 28/8/25 08:42, Alexei Starovoitov wrote: > > On Tue, Aug 26, 2025 at 7:58 PM Leon Hwang <leon.hwang@linux.dev> wrote: > >> On 27/8/25 10:23, Alexei Starovoitov wrote: > >>> On Tue, Aug 26, 2025 at 7:13 PM Leon Hwang <leon.hwang@linux.dev> wrote: > >>>> > >>>> Hi, > >>>> > >>>> I’ve encountered a reproducible deadlock while developing the funcgraph > >>>> feature for bpfsnoop [0]. > >>> > >>> debug it pls. > >> > >> It’s quite difficult for me. I’ve tried debugging it but didn’t succeed. > >> > >>> Sounds like you're implying that the root cause is in bpf, > >>> but why do you think so? > >>> > >>> You're attaching to things that shouldn't be attached to. > >>> Like rcu_lockdep_current_cpu_online() > >>> so effectively you're recursing in that lockdep code. > >>> See big lock there. It will dead lock for sure. > >> > >> If a function that acquires a lock can be traced by a tracing program, > >> bpfsnoop’s funcgraph will attempt to trace it as well. In such cases, a > >> deadlock is highly likely to occur. > >> > >> With bpfsnoop I try my best to avoid such deadlock issues. But what > >> about other bpf tracing tools? If they don’t handle this properly, the > >> kernel is very likely to crash. > > > > bpf infra is trying hard not to crash it, but debug kernel is a different > > category. rcu_read_lock_held() doesn't exist in production kernels. > > You can propose adding "notrace" for it, but in general that doesn't scale. > > Same with rcu_lockdep_current_cpu_online(). > > It probably deserves "notrace" too. > > Indeed, it doesn't scale. > > When I run > ./bpfsnoop -k "htab_*_elem" --output-fgraph --fgraph-debug > --fgraph-exclude > 'rcu_read_lock_*held,rcu_lockdep_current_cpu_online,*raw_spin_*lock*,kvfree,show_stack,put_task_stack', > the kernel doesn’t panic, but the OS eventually stalls and becomes > unresponsive to key presses. > > It seems preferable to avoid running BPF programs continuously in such > cases. Agreed, when adding code to the Linux kernel, whether via a patch, via a BPF program, or by whatever other means, you are taking responsibility for the speed, scalability, and latency effects of that code. Nevertheless, I am happy to add a few "notrace" modifiers if needed. Do you guys need them for rcu_read_lock_held() and rcu_lockdep_current_cpu_online()? Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature 2025-08-28 11:50 ` Paul E. McKenney @ 2025-08-28 13:39 ` Leon Hwang 2025-08-28 16:43 ` Alexei Starovoitov 0 siblings, 1 reply; 12+ messages in thread From: Leon Hwang @ 2025-08-28 13:39 UTC (permalink / raw) To: paulmck Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa On Thu Aug 28, 2025 at 7:50 PM +08, Paul E. McKenney wrote: > On Thu, Aug 28, 2025 at 10:40:47AM +0800, Leon Hwang wrote: >> On 28/8/25 08:42, Alexei Starovoitov wrote: >> > On Tue, Aug 26, 2025 at 7:58 PM Leon Hwang <leon.hwang@linux.dev> wrote: [...] >> > >> > bpf infra is trying hard not to crash it, but debug kernel is a different >> > category. rcu_read_lock_held() doesn't exist in production kernels. >> > You can propose adding "notrace" for it, but in general that doesn't scale. >> > Same with rcu_lockdep_current_cpu_online(). >> > It probably deserves "notrace" too. >> >> Indeed, it doesn't scale. >> >> When I run >> ./bpfsnoop -k "htab_*_elem" --output-fgraph --fgraph-debug >> --fgraph-exclude >> 'rcu_read_lock_*held,rcu_lockdep_current_cpu_online,*raw_spin_*lock*,kvfree,show_stack,put_task_stack', >> the kernel doesn’t panic, but the OS eventually stalls and becomes >> unresponsive to key presses. >> >> It seems preferable to avoid running BPF programs continuously in such >> cases. > > Agreed, when adding code to the Linux kernel, whether via a patch, via > a BPF program, or by whatever other means, you are taking responsibility > for the speed, scalability, and latency effects of that code. > > Nevertheless, I am happy to add a few "notrace" modifiers > if needed. Do you guys need them for rcu_read_lock_held() and > rcu_lockdep_current_cpu_online()? > I think it would be better to add "notrace" to following functions: ./bpfsnoop -k 'rcu_read_*lock_*held*,rcu_lockdep_*' --show-func-proto bool rcu_lockdep_current_cpu_online(); [traceable] int rcu_read_lock_any_held(); [traceable] int rcu_read_lock_bh_held(); [traceable] int rcu_read_lock_held(); [traceable] int rcu_read_lock_sched_held(); [traceable] Thanks, Leon ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature 2025-08-28 13:39 ` Leon Hwang @ 2025-08-28 16:43 ` Alexei Starovoitov 2025-08-28 17:24 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Alexei Starovoitov @ 2025-08-28 16:43 UTC (permalink / raw) To: Leon Hwang Cc: Paul E. McKenney, bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa On Thu, Aug 28, 2025 at 6:39 AM Leon Hwang <leon.hwang@linux.dev> wrote: > > On Thu Aug 28, 2025 at 7:50 PM +08, Paul E. McKenney wrote: > > On Thu, Aug 28, 2025 at 10:40:47AM +0800, Leon Hwang wrote: > >> On 28/8/25 08:42, Alexei Starovoitov wrote: > >> > On Tue, Aug 26, 2025 at 7:58 PM Leon Hwang <leon.hwang@linux.dev> wrote: > > [...] > > >> > > >> > bpf infra is trying hard not to crash it, but debug kernel is a different > >> > category. rcu_read_lock_held() doesn't exist in production kernels. > >> > You can propose adding "notrace" for it, but in general that doesn't scale. > >> > Same with rcu_lockdep_current_cpu_online(). > >> > It probably deserves "notrace" too. > >> > >> Indeed, it doesn't scale. > >> > >> When I run > >> ./bpfsnoop -k "htab_*_elem" --output-fgraph --fgraph-debug > >> --fgraph-exclude > >> 'rcu_read_lock_*held,rcu_lockdep_current_cpu_online,*raw_spin_*lock*,kvfree,show_stack,put_task_stack', > >> the kernel doesn’t panic, but the OS eventually stalls and becomes > >> unresponsive to key presses. > >> > >> It seems preferable to avoid running BPF programs continuously in such > >> cases. > > > > Agreed, when adding code to the Linux kernel, whether via a patch, via > > a BPF program, or by whatever other means, you are taking responsibility > > for the speed, scalability, and latency effects of that code. > > > > Nevertheless, I am happy to add a few "notrace" modifiers > > if needed. Do you guys need them for rcu_read_lock_held() and > > rcu_lockdep_current_cpu_online()? > > > > I think it would be better to add "notrace" to following functions: > > ./bpfsnoop -k 'rcu_read_*lock_*held*,rcu_lockdep_*' --show-func-proto > bool rcu_lockdep_current_cpu_online(); [traceable] > int rcu_read_lock_any_held(); [traceable] > int rcu_read_lock_bh_held(); [traceable] > int rcu_read_lock_held(); [traceable] > int rcu_read_lock_sched_held(); [traceable] Agree. Seems like an easy way to remove a footgun. Independently it would be good to make noinstr/notrace to include __cpuidle functions. I think right now it's allowed to attach to default_idle() which is causing issues. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature 2025-08-28 16:43 ` Alexei Starovoitov @ 2025-08-28 17:24 ` Paul E. McKenney 2025-08-29 2:21 ` Leon Hwang 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2025-08-28 17:24 UTC (permalink / raw) To: Alexei Starovoitov Cc: Leon Hwang, bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa On Thu, Aug 28, 2025 at 09:43:00AM -0700, Alexei Starovoitov wrote: > On Thu, Aug 28, 2025 at 6:39 AM Leon Hwang <leon.hwang@linux.dev> wrote: > > > > On Thu Aug 28, 2025 at 7:50 PM +08, Paul E. McKenney wrote: > > > On Thu, Aug 28, 2025 at 10:40:47AM +0800, Leon Hwang wrote: > > >> On 28/8/25 08:42, Alexei Starovoitov wrote: > > >> > On Tue, Aug 26, 2025 at 7:58 PM Leon Hwang <leon.hwang@linux.dev> wrote: > > > > [...] > > > > >> > > > >> > bpf infra is trying hard not to crash it, but debug kernel is a different > > >> > category. rcu_read_lock_held() doesn't exist in production kernels. > > >> > You can propose adding "notrace" for it, but in general that doesn't scale. > > >> > Same with rcu_lockdep_current_cpu_online(). > > >> > It probably deserves "notrace" too. > > >> > > >> Indeed, it doesn't scale. > > >> > > >> When I run > > >> ./bpfsnoop -k "htab_*_elem" --output-fgraph --fgraph-debug > > >> --fgraph-exclude > > >> 'rcu_read_lock_*held,rcu_lockdep_current_cpu_online,*raw_spin_*lock*,kvfree,show_stack,put_task_stack', > > >> the kernel doesn’t panic, but the OS eventually stalls and becomes > > >> unresponsive to key presses. > > >> > > >> It seems preferable to avoid running BPF programs continuously in such > > >> cases. > > > > > > Agreed, when adding code to the Linux kernel, whether via a patch, via > > > a BPF program, or by whatever other means, you are taking responsibility > > > for the speed, scalability, and latency effects of that code. > > > > > > Nevertheless, I am happy to add a few "notrace" modifiers > > > if needed. Do you guys need them for rcu_read_lock_held() and > > > rcu_lockdep_current_cpu_online()? > > > > > > > I think it would be better to add "notrace" to following functions: > > > > ./bpfsnoop -k 'rcu_read_*lock_*held*,rcu_lockdep_*' --show-func-proto > > bool rcu_lockdep_current_cpu_online(); [traceable] > > int rcu_read_lock_any_held(); [traceable] > > int rcu_read_lock_bh_held(); [traceable] > > int rcu_read_lock_held(); [traceable] > > int rcu_read_lock_sched_held(); [traceable] > > Agree. Seems like an easy way to remove a footgun. Very good, and please see below. This might or might not make the next merge window, but if not, it should be good for the one after that. > Independently it would be good to make noinstr/notrace to include __cpuidle > functions. I think right now it's allowed to attach to default_idle() > which is causing issues. Leon, would you be interested in putting together a patch for these? Thanx, Paul ------------------------------------------------------------------------ commit dada60c8851f19e54524cc1bcf8ab5938eb909c9 Author: Paul E. McKenney <paulmck@kernel.org> Date: Thu Aug 28 10:17:10 2025 -0700 rcu: Mark diagnostic functions as notrace The rcu_lockdep_current_cpu_online(), rcu_read_lock_sched_held(), rcu_read_lock_held(), rcu_read_lock_bh_held(), rcu_read_lock_any_held() are used by tracing-related code paths, so putting traces on them is unlikely to make anyone happy. This commit therefore marks them all "notrace". Reported-by: Leon Hwang <leon.hwang@linux.dev> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 1291e0761d70ab..2515ee9a82df4f 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4021,7 +4021,7 @@ bool rcu_cpu_online(int cpu) * RCU on an offline processor during initial boot, hence the check for * rcu_scheduler_fully_active. */ -bool rcu_lockdep_current_cpu_online(void) +bool notrace rcu_lockdep_current_cpu_online(void) { struct rcu_data *rdp; bool ret = false; diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index c912b594ba987f..dfeba9b3539508 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -117,7 +117,7 @@ static bool rcu_read_lock_held_common(bool *ret) return false; } -int rcu_read_lock_sched_held(void) +int notrace rcu_read_lock_sched_held(void) { bool ret; @@ -342,7 +342,7 @@ EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled); * Note that rcu_read_lock() is disallowed if the CPU is either idle or * offline from an RCU perspective, so check for those as well. */ -int rcu_read_lock_held(void) +int notrace rcu_read_lock_held(void) { bool ret; @@ -367,7 +367,7 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_held); * Note that rcu_read_lock_bh() is disallowed if the CPU is either idle or * offline from an RCU perspective, so check for those as well. */ -int rcu_read_lock_bh_held(void) +int notrace rcu_read_lock_bh_held(void) { bool ret; @@ -377,7 +377,7 @@ int rcu_read_lock_bh_held(void) } EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held); -int rcu_read_lock_any_held(void) +int notrace rcu_read_lock_any_held(void) { bool ret; ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature 2025-08-28 17:24 ` Paul E. McKenney @ 2025-08-29 2:21 ` Leon Hwang 2025-08-29 18:08 ` Alexei Starovoitov 0 siblings, 1 reply; 12+ messages in thread From: Leon Hwang @ 2025-08-29 2:21 UTC (permalink / raw) To: paulmck, Alexei Starovoitov Cc: Leon Hwang, bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa On 29/8/25 01:24, Paul E. McKenney wrote: > On Thu, Aug 28, 2025 at 09:43:00AM -0700, Alexei Starovoitov wrote: >> On Thu, Aug 28, 2025 at 6:39 AM Leon Hwang <leon.hwang@linux.dev> wrote: >>> >>> On Thu Aug 28, 2025 at 7:50 PM +08, Paul E. McKenney wrote: [...] >>> >>> I think it would be better to add "notrace" to following functions: >>> >>> ./bpfsnoop -k 'rcu_read_*lock_*held*,rcu_lockdep_*' --show-func-proto >>> bool rcu_lockdep_current_cpu_online(); [traceable] >>> int rcu_read_lock_any_held(); [traceable] >>> int rcu_read_lock_bh_held(); [traceable] >>> int rcu_read_lock_held(); [traceable] >>> int rcu_read_lock_sched_held(); [traceable] >> >> Agree. Seems like an easy way to remove a footgun. > > Very good, and please see below. This might or might not make the next > merge window, but if not, it should be good for the one after that. > >> Independently it would be good to make noinstr/notrace to include __cpuidle >> functions. I think right now it's allowed to attach to default_idle() >> which is causing issues. Nope. ./bpfsnoop -k 'default_idle' -k '__cpuidle' --show-func-proto void default_idle(); Here are the traced functions of ./bpfsnoop -D -k "htab_*_elem" --output-fgraph --fgraph-debug --fgraph-exclude 'rcu_read_lock_*held,rcu_lockdep_current_cpu_online,*raw_spin_*lock*,kvfree,show_stack,put_task_stack': htab_lru_map_delete_elem htab_lru_map_lookup_and_delete_elem htab_lru_map_lookup_elem htab_lru_map_update_elem htab_lru_percpu_map_lookup_and_delete_elem htab_lru_percpu_map_lookup_elem htab_lru_percpu_map_lookup_percpu_elem htab_lru_percpu_map_update_elem htab_map_delete_elem htab_map_lookup_and_delete_elem htab_map_lookup_elem htab_map_seq_show_elem htab_map_update_elem htab_of_map_lookup_elem htab_percpu_map_lookup_and_delete_elem htab_percpu_map_lookup_elem htab_percpu_map_lookup_percpu_elem htab_percpu_map_seq_show_elem htab_percpu_map_update_elem add_taint alloc_htab_elem arch_scale_cpu_capacity arch_stack_walk bpf_list_head_free __bpf_obj_drop_impl bpf_obj_free_fields __bpf_prog_put bpf_prog_put bpf_rb_root_free bpf_timer_cancel_and_free bpf_wq_cancel_and_free btf_find_struct_meta btf_is_kernel btf_type_seq_show btf_type_seq_show_flags btf_type_show check_and_free_fields check_panic_on_warn check_timeout clear_pending_if_disabled console_verbose copy_map_value_locked ctx_sched_out __delayacct_blkio_start __do_set_cpus_allowed do_trace_write_msr enter_lazy_tlb __folio_put __folio_unqueue_deferred_split free_frozen_pages free_htab_elem free_huge_folio get_free_pages_noprof get_state_synchronize_rcu get_state_synchronize_rcu_full get_taint hrtick_start_fair hrtimer_active hrtimer_cancel hrtimer_setup hrtimer_start_range_ns __htab_lru_percpu_map_update_elem __htab_map_lookup_and_delete_elem __htab_map_lookup_elem htab_map_update_elem_in_place invalidate_user_asid __is_kernel_percpu_address is_kernel_percpu_address is_module_address __is_module_percpu_address is_module_percpu_address is_vmalloc_addr kallsyms_lookup kexec_crash_loaded kvfree_call_rcu local_clock __mem_cgroup_uncharge __might_sleep mm_needs_global_asid mod_node_page_state __msecs_to_jiffies nbcon_cpu_emergency_enter nbcon_cpu_emergency_exit nbcon_get_cpu_emergency_nesting pcpu_copy_value perf_cgroup_switch perf_clear_dirty_counters perf_ctx_disable perf_ctx_enable perf_ctx_sched_task_cb __perf_event_task_sched_out perf_event_update_time perf_event_update_userpage perf_exclude_event perf_pmu_sched_task ___perf_sw_event perf_swevent_event pick_next_task_fair pick_task_fair pick_task_fair pick_task_idle preempt_model_str __printk_cpu_sync_put __printk_cpu_sync_try_get __printk_cpu_sync_wait printk_percpu_data_ready print_modules print_stop_info print_tainted print_tainted_verbose print_worker_info profile_hits put_prev_entity queue_delayed_work_on queued_spin_lock_slowpath __queue_work queue_work_on rcu_note_context_switch rcu_qs rcu_report_exp_cpu_mult rcu_tasks_trace_qs_blkd rcu_trc_cmpxchg_need_qs resilient_queued_spin_lock_slowpath sched_balance_newidle __schedule_bug __schedule_delayed_monitor_work seq_putc seq_write set_next_entity stack_trace_print stack_trace_save switch_ldt switch_mm_irqs_off synchronize_rcu task_h_load task_work_add this_cpu_in_panic touch_all_softlockup_watchdogs __traceiter_contention_begin __traceiter_contention_end __traceiter_lock_acquire __traceiter_lock_release __traceiter_rcu_preempt_task __traceiter_sched_entry_tp __traceiter_sched_exit_tp __traceiter_tlb_flush unpin_user_page __update_context_time update_rq_clock > > Leon, would you be interested in putting together a patch for these? > I think we can apply this patch first. If other patch is required, I would like to send it later. Thanks, Leon ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature 2025-08-29 2:21 ` Leon Hwang @ 2025-08-29 18:08 ` Alexei Starovoitov 2025-09-01 2:38 ` Leon Hwang 0 siblings, 1 reply; 12+ messages in thread From: Alexei Starovoitov @ 2025-08-29 18:08 UTC (permalink / raw) To: Leon Hwang, Jiri Olsa Cc: Paul E. McKenney, Leon Hwang, bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann On Thu, Aug 28, 2025 at 7:21 PM Leon Hwang <hffilwlqm@gmail.com> wrote: > > > > > >> Independently it would be good to make noinstr/notrace to include __cpuidle > >> functions. I think right now it's allowed to attach to default_idle() > >> which is causing issues. > > Nope. > ./bpfsnoop -k 'default_idle' -k '__cpuidle' --show-func-proto > void default_idle(); hmm. indeed. Jiri, why do we still have this in selftest ? static bool skip_entry(char *name) { /* * We attach to almost all kernel functions and some of them * will cause 'suspicious RCU usage' when fprobe is attached * to them. Filter out the current culprits - arch_cpu_idle * default_idle and rcu_* functions. */ if (!strcmp(name, "arch_cpu_idle")) return true; if (!strcmp(name, "default_idle")) return true; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature 2025-08-29 18:08 ` Alexei Starovoitov @ 2025-09-01 2:38 ` Leon Hwang 0 siblings, 0 replies; 12+ messages in thread From: Leon Hwang @ 2025-09-01 2:38 UTC (permalink / raw) To: Alexei Starovoitov, Leon Hwang, Jiri Olsa Cc: Paul E. McKenney, bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann On 30/8/25 02:08, Alexei Starovoitov wrote: > On Thu, Aug 28, 2025 at 7:21 PM Leon Hwang <hffilwlqm@gmail.com> wrote: >> >> >>> >>>> Independently it would be good to make noinstr/notrace to include __cpuidle >>>> functions. I think right now it's allowed to attach to default_idle() >>>> which is causing issues. >> >> Nope. >> ./bpfsnoop -k 'default_idle' -k '__cpuidle' --show-func-proto >> void default_idle(); > > hmm. indeed. > > Jiri, > > why do we still have this in selftest ? > > static bool skip_entry(char *name) > { > /* > * We attach to almost all kernel functions and some of them > * will cause 'suspicious RCU usage' when fprobe is attached > * to them. Filter out the current culprits - arch_cpu_idle > * default_idle and rcu_* functions. > */ > if (!strcmp(name, "arch_cpu_idle")) > return true; > if (!strcmp(name, "default_idle")) > return true; After removing "arch_cpu_idle" and "default_idle", it's OK to run the selftest: ./test_progs -t kprobe_multi_test #155/1 kprobe_multi_test/skel_api:OK #155/2 kprobe_multi_test/link_api_addrs:OK #155/3 kprobe_multi_test/link_api_syms:OK #155/4 kprobe_multi_test/attach_api_pattern:OK #155/5 kprobe_multi_test/attach_api_addrs:OK #155/6 kprobe_multi_test/attach_api_syms:OK #155/7 kprobe_multi_test/attach_api_fails:OK #155/8 kprobe_multi_test/attach_override:OK #155/9 kprobe_multi_test/session:OK #155/10 kprobe_multi_test/session_cookie:OK #155/11 kprobe_multi_test/unique_match:OK #155/12 kprobe_multi_test/kprobe_session_return_0:OK #155/13 kprobe_multi_test/kprobe_session_return_1:OK #155/14 kprobe_multi_test/kprobe_session_return_2:OK #155 kprobe_multi_test:OK #156/1 kprobe_multi_testmod_test/testmod_attach_api_syms:OK #156/2 kprobe_multi_testmod_test/testmod_attach_api_addrs:OK #156 kprobe_multi_testmod_test:OK Summary: 2/16 PASSED, 0 SKIPPED, 0 FAILED Thanks, Leon ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-09-01 2:38 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-27 2:13 [BUG] Deadlock triggered by bpfsnoop funcgraph feature Leon Hwang 2025-08-27 2:23 ` Alexei Starovoitov 2025-08-27 2:58 ` Leon Hwang 2025-08-28 0:42 ` Alexei Starovoitov 2025-08-28 2:40 ` Leon Hwang 2025-08-28 11:50 ` Paul E. McKenney 2025-08-28 13:39 ` Leon Hwang 2025-08-28 16:43 ` Alexei Starovoitov 2025-08-28 17:24 ` Paul E. McKenney 2025-08-29 2:21 ` Leon Hwang 2025-08-29 18:08 ` Alexei Starovoitov 2025-09-01 2:38 ` Leon Hwang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).