* bpf: incorrect passing infinate loop causing rcu detected stall during bpf_prog_run() @ 2023-10-27 9:08 Hao Sun 2023-10-29 0:34 ` Alexei Starovoitov 0 siblings, 1 reply; 4+ messages in thread From: Hao Sun @ 2023-10-27 9:08 UTC (permalink / raw) To: Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko, Shuah Khan Cc: bpf, Linux Kernel Mailing List Hi, The following C repro contains a bpf program that can cause rcu stall/soft lockup during running in bpf_prog_run(). Seems the verifier incorrectly passed the program with an infinite loop. C repro: https://pastebin.com/raw/ymzAxjeU Verifier's log: https://pastebin.com/raw/thZDTFJc rcu stall: rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 7-....: (10500 ticks this GP) idle=c144/1/0x4000000000000000 softirq=6017/6017 fqs=4579 rcu: hardirqs softirqs csw/system rcu: number: 0 212 0 rcu: cputime: 0 0 52479 ==> 52480(ms) rcu: (t=10501 jiffies g=8277 q=132 ncpus=8) CPU: 7 PID: 8633 Comm: bpf-test Not tainted 6.6.0-rc5-01400-g7c2f6c9fb91f-dirty #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:___bpf_prog_run+0x4cbf/0x9720 kernel/bpf/core.c:2099 Code: 80 3c 38 00 0f 85 c0 44 00 00 41 0f b6 44 24 01 4c 8b 2b c0 e8 04 0f b6 c0 48 8d 5c c5 00 48 89 d8 48 c1 e8 03 42 80 3c 38 00 <0f> 85 8d 44 00 00 4c 3b 2b 0f 86 16 0f 00 00 49 8d 7c 24 02 48 89 RSP: 0018:ffffc90006eb7a58 EFLAGS: 00000246 RAX: 1ffff92000dd6f70 RBX: ffffc90006eb7b80 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 00000000000014ff RDI: ffffc900052160d1 RBP: ffffc90006eb7b48 R08: ffffc90005216144 R09: fffffbfff228f9d0 R10: ffffffff9147ce87 R11: 0000000000088001 R12: ffffc900052160d0 R13: 000000000000001f R14: 0000000000000000 R15: dffffc0000000000 FS: 00007fcc4dc606c0(0000) GS:ffff88832db80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fcc4dcb6000 CR3: 0000000157724000 CR4: 00000000000006e0 Call Trace: <IRQ> </IRQ> <TASK> __bpf_prog_run32+0x8d/0xd0 kernel/bpf/core.c:2264 bpf_dispatcher_nop_func include/linux/bpf.h:1192 [inline] __bpf_prog_run include/linux/filter.h:651 [inline] bpf_prog_run include/linux/filter.h:658 [inline] __bpf_prog_test_run_raw_tp+0xc5/0x2c0 net/bpf/test_run.c:712 bpf_prog_test_run_raw_tp+0x304/0x560 net/bpf/test_run.c:752 bpf_prog_put kernel/bpf/syscall.c:2165 [inline] bpf_prog_test_run kernel/bpf/syscall.c:4042 [inline] __sys_bpf+0xf98/0x4380 kernel/bpf/syscall.c:5401 __do_sys_bpf kernel/bpf/syscall.c:5487 [inline] __se_sys_bpf kernel/bpf/syscall.c:5485 [inline] __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:5485 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x564499114a87 Code: 81 f9 01 02 00 00 72 03 49 8b 12 48 89 54 24 28 89 44 24 1c 48 8d 74 24 10 b8 41 01 00 00 bf 0a 00 00 00 ba 50 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 8a 00 00 00 8b 7c 24 3c 49 8b 88 18 08 00 RSP: 002b:00007fcc4dc594d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 00007fcc4dc59860 RCX: 0000564499114a87 RDX: 0000000000000050 RSI: 00007fcc4dc594e0 RDI: 000000000000000a RBP: 00007fcc48002d00 R08: 00005644993d4e10 R09: 00005644993d5428 R10: 00005644993d5018 R11: 0000000000000246 R12: 00007fcc4dc5e458 R13: 00007fcc4dc5bb40 R14: 00007fcc4fd1cbb0 R15: 00007fcc4dc5bc28 </TASK> watchdog: BUG: soft lockup - CPU#7 stuck for 246s! [bpf-test:8633] Modules linked in: irq event stamp: 64410 hardirqs last enabled at (64409): [<ffffffff8960140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 hardirqs last disabled at (64410): [<ffffffff89455f6f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1074 softirqs last enabled at (64378): [<ffffffff8145aa47>] invoke_softirq kernel/softirq.c:427 [inline] softirqs last enabled at (64378): [<ffffffff8145aa47>] __irq_exit_rcu kernel/softirq.c:632 [inline] softirqs last enabled at (64378): [<ffffffff8145aa47>] irq_exit_rcu+0xb7/0x120 kernel/softirq.c:644 softirqs last disabled at (64373): [<ffffffff8145aa47>] invoke_softirq kernel/softirq.c:427 [inline] softirqs last disabled at (64373): [<ffffffff8145aa47>] __irq_exit_rcu kernel/softirq.c:632 [inline] softirqs last disabled at (64373): [<ffffffff8145aa47>] irq_exit_rcu+0xb7/0x120 kernel/softirq.c:644 CPU: 7 PID: 8633 Comm: bpf-test Not tainted 6.6.0-rc5-01400-g7c2f6c9fb91f-dirty #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:kcov_trace_bpf_prog_regs+0x68/0xc0 kernel/kcov.c:248 Code: 83 f8 04 75 5c 4c 8b a2 e8 15 00 00 8b 92 e4 15 00 00 49 8b 04 24 48 c1 e2 03 48 8d 34 80 48 8d 4c 36 0b 48 c1 e1 03 48 39 ca <72> 35 48 83 c0 01 49 89 04 24 48 c1 e6 04 49 89 7c 0c b0 31 db 49 RSP: 0018:ffffc90006eb7a38 EFLAGS: 00000297 RAX: 0000000000000433 RBX: ffffc90006eb7b48 RCX: 0000000000015048 RDX: 0000000000015000 RSI: 00000000000014ff RDI: 000000000000001c RBP: ffffc90006eb7b48 R08: ffffc90005216144 R09: fffffbfff228f9d0 R10: ffffffff9147ce87 R11: 0000000000088001 R12: ffffc90006c39000 R13: ffff1102f32b8600 R14: 0000000000000000 R15: dffffc0000000000 FS: 00007fcc4dc606c0(0000) GS:ffff88832db80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fcc4dcb6000 CR3: 0000000157724000 CR4: 00000000000006e0 Call Trace: <IRQ> </IRQ> <TASK> ___bpf_prog_run+0x53d9/0x9720 kernel/bpf/core.c:2100 __bpf_prog_run32+0x8d/0xd0 kernel/bpf/core.c:2264 bpf_dispatcher_nop_func include/linux/bpf.h:1192 [inline] __bpf_prog_run include/linux/filter.h:651 [inline] bpf_prog_run include/linux/filter.h:658 [inline] __bpf_prog_test_run_raw_tp+0xc5/0x2c0 net/bpf/test_run.c:712 bpf_prog_test_run_raw_tp+0x304/0x560 net/bpf/test_run.c:752 bpf_prog_put kernel/bpf/syscall.c:2165 [inline] bpf_prog_test_run kernel/bpf/syscall.c:4042 [inline] __sys_bpf+0xf98/0x4380 kernel/bpf/syscall.c:5401 __do_sys_bpf kernel/bpf/syscall.c:5487 [inline] __se_sys_bpf kernel/bpf/syscall.c:5485 [inline] __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:5485 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x564499114a87 Code: 81 f9 01 02 00 00 72 03 49 8b 12 48 89 54 24 28 89 44 24 1c 48 8d 74 24 10 b8 41 01 00 00 bf 0a 00 00 00 ba 50 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 8a 00 00 00 8b 7c 24 3c 49 8b 88 18 08 00 RSP: 002b:00007fcc4dc594d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 00007fcc4dc59860 RCX: 0000564499114a87 RDX: 0000000000000050 RSI: 00007fcc4dc594e0 RDI: 000000000000000a RBP: 00007fcc48002d00 R08: 00005644993d4e10 R09: 00005644993d5428 R10: 00005644993d5018 R11: 0000000000000246 R12: 00007fcc4dc5e458 R13: 00007fcc4dc5bb40 R14: 00007fcc4fd1cbb0 R15: 00007fcc4dc5bc28 </TASK> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bpf: incorrect passing infinate loop causing rcu detected stall during bpf_prog_run() 2023-10-27 9:08 bpf: incorrect passing infinate loop causing rcu detected stall during bpf_prog_run() Hao Sun @ 2023-10-29 0:34 ` Alexei Starovoitov 2023-10-30 10:29 ` Hao Sun 0 siblings, 1 reply; 4+ messages in thread From: Alexei Starovoitov @ 2023-10-29 0:34 UTC (permalink / raw) To: Hao Sun Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko, Shuah Khan, bpf, Linux Kernel Mailing List On Fri, Oct 27, 2023 at 2:09 AM Hao Sun <sunhao.th@gmail.com> wrote: > > Hi, > > The following C repro contains a bpf program that can cause rcu > stall/soft lockup during running in bpf_prog_run(). Seems the verifier > incorrectly passed the program with an infinite loop. > > C repro: https://pastebin.com/raw/ymzAxjeU Thanks for the report. Did you debug what exactly caused this bug? Are you planning to work on the fix? > Verifier's log: https://pastebin.com/raw/thZDTFJc log is trimmed. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bpf: incorrect passing infinate loop causing rcu detected stall during bpf_prog_run() 2023-10-29 0:34 ` Alexei Starovoitov @ 2023-10-30 10:29 ` Hao Sun 2023-10-30 18:16 ` Eduard Zingerman 0 siblings, 1 reply; 4+ messages in thread From: Hao Sun @ 2023-10-30 10:29 UTC (permalink / raw) To: Alexei Starovoitov Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko, Shuah Khan, bpf, Linux Kernel Mailing List On Sun, Oct 29, 2023 at 2:35 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Fri, Oct 27, 2023 at 2:09 AM Hao Sun <sunhao.th@gmail.com> wrote: > > > > Hi, > > > > The following C repro contains a bpf program that can cause rcu > > stall/soft lockup during running in bpf_prog_run(). Seems the verifier > > incorrectly passed the program with an infinite loop. > > > > C repro: https://pastebin.com/raw/ymzAxjeU > > Thanks for the report. > Did you debug what exactly caused this bug? > Are you planning to work on the fix? This bug is really hard to debug. Here is a simplified view of the original program: loop: 0: r4 = r8 1: r1 = 0x1f 2: r8 -= -8 3: if r1 > r7 goto pc+1 4: r7 <<= r1 ; LSH r7 by 31 5: r5 = r0 6: r5 *= 2 7: if r5 < r0 goto pc+1 8: r8 s>>= 6 9: w7 &= w7 ; r7 = 0 after the first iter 10: r8 -= r7 11: r8 -= -1 12: if r4 >= 0x9 goto loop 13: exit At runtime, r7 is updated to 0 through #4 and #9 at the first iteration, so the following iteration will not take #3 to #4, so #3 can be ignored after the first iteration. r0 is init by get_current_task, and r5 is always smaller than r0 at runtime, so #7 to #8 will never run. So, the update to r8 is only #2 and #11, which together add 9 to r8. Since r4 is set to r8 at the start of each iteration, so it's an infinite loop at runtime. Based on the log, the verifier keeps tracking #7 to #8 and to #9, and at some point, the verifier prunes states and path from #7 to #9, so it stops checking. The log is huge and hard to follow, the issue is likely in pruning logic, but I don't have much knowledge about that part. > > > Verifier's log: https://pastebin.com/raw/thZDTFJc > > log is trimmed. Full log: https://pastebin.com/raw/cTC8wmDH ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bpf: incorrect passing infinate loop causing rcu detected stall during bpf_prog_run() 2023-10-30 10:29 ` Hao Sun @ 2023-10-30 18:16 ` Eduard Zingerman 0 siblings, 0 replies; 4+ messages in thread From: Eduard Zingerman @ 2023-10-30 18:16 UTC (permalink / raw) To: Hao Sun, Alexei Starovoitov Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko, Shuah Khan, bpf, Linux Kernel Mailing List On Mon, 2023-10-30 at 11:29 +0100, Hao Sun wrote: > On Sun, Oct 29, 2023 at 2:35 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Fri, Oct 27, 2023 at 2:09 AM Hao Sun <sunhao.th@gmail.com> wrote: > > > > > > Hi, > > > > > > The following C repro contains a bpf program that can cause rcu > > > stall/soft lockup during running in bpf_prog_run(). Seems the verifier > > > incorrectly passed the program with an infinite loop. > > > > > > C repro: https://pastebin.com/raw/ymzAxjeU > > > > Thanks for the report. > > Did you debug what exactly caused this bug? > > Are you planning to work on the fix? > > This bug is really hard to debug. Here is a simplified view of > the original program: > > loop: > 0: r4 = r8 > 1: r1 = 0x1f > 2: r8 -= -8 > 3: if r1 > r7 goto pc+1 > 4: r7 <<= r1 ; LSH r7 by 31 > 5: r5 = r0 > 6: r5 *= 2 > 7: if r5 < r0 goto pc+1 > 8: r8 s>>= 6 > 9: w7 &= w7 ; r7 = 0 after the first iter > 10: r8 -= r7 > 11: r8 -= -1 > 12: if r4 >= 0x9 goto loop > 13: exit > > At runtime, r7 is updated to 0 through #4 and #9 at the first iteration, > so the following iteration will not take #3 to #4, so #3 can be ignored > after the first iteration. r0 is init by get_current_task, and r5 is always > smaller than r0 at runtime, so #7 to #8 will never run. So, the update > to r8 is only #2 and #11, which together add 9 to r8. Since r4 is set > to r8 at the start of each iteration, so it's an infinite loop at runtime. > > Based on the log, the verifier keeps tracking #7 to #8 and to #9, and > at some point, the verifier prunes states and path from #7 to #9, so > it stops checking. The log is huge and hard to follow, the issue is likely > in pruning logic, but I don't have much knowledge about that part. I can take a look at this issue but closer to the end of the week (Thu/Fri). > > > > > > Verifier's log: https://pastebin.com/raw/thZDTFJc > > > > log is trimmed. > > Full log: https://pastebin.com/raw/cTC8wmDH > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-10-30 18:16 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-27 9:08 bpf: incorrect passing infinate loop causing rcu detected stall during bpf_prog_run() Hao Sun 2023-10-29 0:34 ` Alexei Starovoitov 2023-10-30 10:29 ` Hao Sun 2023-10-30 18:16 ` Eduard Zingerman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox