BPF List
 help / color / mirror / Atom feed
* bpf: incorrect passing infinate loop causing rcu detected stall during bpf_prog_run()
@ 2023-10-27  9:08 Hao Sun
  2023-10-29  0:34 ` Alexei Starovoitov
  0 siblings, 1 reply; 4+ messages in thread
From: Hao Sun @ 2023-10-27  9:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan
  Cc: bpf, Linux Kernel Mailing List

Hi,

The following C repro contains a bpf program that can cause rcu
stall/soft lockup during running in bpf_prog_run(). Seems the verifier
incorrectly passed the program with an infinite loop.

C repro: https://pastebin.com/raw/ymzAxjeU
Verifier's log: https://pastebin.com/raw/thZDTFJc

rcu stall:

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 7-....: (10500 ticks this GP) idle=c144/1/0x4000000000000000
softirq=6017/6017 fqs=4579
rcu:          hardirqs   softirqs   csw/system
rcu: number:        0        212            0
rcu: cputime:        0          0        52479   ==> 52480(ms)
rcu: (t=10501 jiffies g=8277 q=132 ncpus=8)
CPU: 7 PID: 8633 Comm: bpf-test Not tainted
6.6.0-rc5-01400-g7c2f6c9fb91f-dirty #25
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:___bpf_prog_run+0x4cbf/0x9720 kernel/bpf/core.c:2099
Code: 80 3c 38 00 0f 85 c0 44 00 00 41 0f b6 44 24 01 4c 8b 2b c0 e8
04 0f b6 c0 48 8d 5c c5 00 48 89 d8 48 c1 e8 03 42 80 3c 38 00 <0f> 85
8d 44 00 00 4c 3b 2b 0f 86 16 0f 00 00 49 8d 7c 24 02 48 89
RSP: 0018:ffffc90006eb7a58 EFLAGS: 00000246
RAX: 1ffff92000dd6f70 RBX: ffffc90006eb7b80 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 00000000000014ff RDI: ffffc900052160d1
RBP: ffffc90006eb7b48 R08: ffffc90005216144 R09: fffffbfff228f9d0
R10: ffffffff9147ce87 R11: 0000000000088001 R12: ffffc900052160d0
R13: 000000000000001f R14: 0000000000000000 R15: dffffc0000000000
FS:  00007fcc4dc606c0(0000) GS:ffff88832db80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcc4dcb6000 CR3: 0000000157724000 CR4: 00000000000006e0
Call Trace:
 <IRQ>
 </IRQ>
 <TASK>
 __bpf_prog_run32+0x8d/0xd0 kernel/bpf/core.c:2264
 bpf_dispatcher_nop_func include/linux/bpf.h:1192 [inline]
 __bpf_prog_run include/linux/filter.h:651 [inline]
 bpf_prog_run include/linux/filter.h:658 [inline]
 __bpf_prog_test_run_raw_tp+0xc5/0x2c0 net/bpf/test_run.c:712
 bpf_prog_test_run_raw_tp+0x304/0x560 net/bpf/test_run.c:752
 bpf_prog_put kernel/bpf/syscall.c:2165 [inline]
 bpf_prog_test_run kernel/bpf/syscall.c:4042 [inline]
 __sys_bpf+0xf98/0x4380 kernel/bpf/syscall.c:5401
 __do_sys_bpf kernel/bpf/syscall.c:5487 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:5485 [inline]
 __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:5485
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x564499114a87
Code: 81 f9 01 02 00 00 72 03 49 8b 12 48 89 54 24 28 89 44 24 1c 48
8d 74 24 10 b8 41 01 00 00 bf 0a 00 00 00 ba 50 00 00 00 0f 05 <48> 3d
01 f0 ff ff 0f 83 8a 00 00 00 8b 7c 24 3c 49 8b 88 18 08 00
RSP: 002b:00007fcc4dc594d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 00007fcc4dc59860 RCX: 0000564499114a87
RDX: 0000000000000050 RSI: 00007fcc4dc594e0 RDI: 000000000000000a
RBP: 00007fcc48002d00 R08: 00005644993d4e10 R09: 00005644993d5428
R10: 00005644993d5018 R11: 0000000000000246 R12: 00007fcc4dc5e458
R13: 00007fcc4dc5bb40 R14: 00007fcc4fd1cbb0 R15: 00007fcc4dc5bc28
 </TASK>
watchdog: BUG: soft lockup - CPU#7 stuck for 246s! [bpf-test:8633]
Modules linked in:
irq event stamp: 64410
hardirqs last  enabled at (64409): [<ffffffff8960140a>]
asm_sysvec_apic_timer_interrupt+0x1a/0x20
arch/x86/include/asm/idtentry.h:645
hardirqs last disabled at (64410): [<ffffffff89455f6f>]
sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1074
softirqs last  enabled at (64378): [<ffffffff8145aa47>] invoke_softirq
kernel/softirq.c:427 [inline]
softirqs last  enabled at (64378): [<ffffffff8145aa47>] __irq_exit_rcu
kernel/softirq.c:632 [inline]
softirqs last  enabled at (64378): [<ffffffff8145aa47>]
irq_exit_rcu+0xb7/0x120 kernel/softirq.c:644
softirqs last disabled at (64373): [<ffffffff8145aa47>] invoke_softirq
kernel/softirq.c:427 [inline]
softirqs last disabled at (64373): [<ffffffff8145aa47>] __irq_exit_rcu
kernel/softirq.c:632 [inline]
softirqs last disabled at (64373): [<ffffffff8145aa47>]
irq_exit_rcu+0xb7/0x120 kernel/softirq.c:644
CPU: 7 PID: 8633 Comm: bpf-test Not tainted
6.6.0-rc5-01400-g7c2f6c9fb91f-dirty #25
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:kcov_trace_bpf_prog_regs+0x68/0xc0 kernel/kcov.c:248
Code: 83 f8 04 75 5c 4c 8b a2 e8 15 00 00 8b 92 e4 15 00 00 49 8b 04
24 48 c1 e2 03 48 8d 34 80 48 8d 4c 36 0b 48 c1 e1 03 48 39 ca <72> 35
48 83 c0 01 49 89 04 24 48 c1 e6 04 49 89 7c 0c b0 31 db 49
RSP: 0018:ffffc90006eb7a38 EFLAGS: 00000297
RAX: 0000000000000433 RBX: ffffc90006eb7b48 RCX: 0000000000015048
RDX: 0000000000015000 RSI: 00000000000014ff RDI: 000000000000001c
RBP: ffffc90006eb7b48 R08: ffffc90005216144 R09: fffffbfff228f9d0
R10: ffffffff9147ce87 R11: 0000000000088001 R12: ffffc90006c39000
R13: ffff1102f32b8600 R14: 0000000000000000 R15: dffffc0000000000
FS:  00007fcc4dc606c0(0000) GS:ffff88832db80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcc4dcb6000 CR3: 0000000157724000 CR4: 00000000000006e0
Call Trace:
 <IRQ>
 </IRQ>
 <TASK>
 ___bpf_prog_run+0x53d9/0x9720 kernel/bpf/core.c:2100
 __bpf_prog_run32+0x8d/0xd0 kernel/bpf/core.c:2264
 bpf_dispatcher_nop_func include/linux/bpf.h:1192 [inline]
 __bpf_prog_run include/linux/filter.h:651 [inline]
 bpf_prog_run include/linux/filter.h:658 [inline]
 __bpf_prog_test_run_raw_tp+0xc5/0x2c0 net/bpf/test_run.c:712
 bpf_prog_test_run_raw_tp+0x304/0x560 net/bpf/test_run.c:752
 bpf_prog_put kernel/bpf/syscall.c:2165 [inline]
 bpf_prog_test_run kernel/bpf/syscall.c:4042 [inline]
 __sys_bpf+0xf98/0x4380 kernel/bpf/syscall.c:5401
 __do_sys_bpf kernel/bpf/syscall.c:5487 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:5485 [inline]
 __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:5485
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x564499114a87
Code: 81 f9 01 02 00 00 72 03 49 8b 12 48 89 54 24 28 89 44 24 1c 48
8d 74 24 10 b8 41 01 00 00 bf 0a 00 00 00 ba 50 00 00 00 0f 05 <48> 3d
01 f0 ff ff 0f 83 8a 00 00 00 8b 7c 24 3c 49 8b 88 18 08 00
RSP: 002b:00007fcc4dc594d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 00007fcc4dc59860 RCX: 0000564499114a87
RDX: 0000000000000050 RSI: 00007fcc4dc594e0 RDI: 000000000000000a
RBP: 00007fcc48002d00 R08: 00005644993d4e10 R09: 00005644993d5428
R10: 00005644993d5018 R11: 0000000000000246 R12: 00007fcc4dc5e458
R13: 00007fcc4dc5bb40 R14: 00007fcc4fd1cbb0 R15: 00007fcc4dc5bc28
 </TASK>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bpf: incorrect passing infinate loop causing rcu detected stall during bpf_prog_run()
  2023-10-27  9:08 bpf: incorrect passing infinate loop causing rcu detected stall during bpf_prog_run() Hao Sun
@ 2023-10-29  0:34 ` Alexei Starovoitov
  2023-10-30 10:29   ` Hao Sun
  0 siblings, 1 reply; 4+ messages in thread
From: Alexei Starovoitov @ 2023-10-29  0:34 UTC (permalink / raw)
  To: Hao Sun
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan, bpf, Linux Kernel Mailing List

On Fri, Oct 27, 2023 at 2:09 AM Hao Sun <sunhao.th@gmail.com> wrote:
>
> Hi,
>
> The following C repro contains a bpf program that can cause rcu
> stall/soft lockup during running in bpf_prog_run(). Seems the verifier
> incorrectly passed the program with an infinite loop.
>
> C repro: https://pastebin.com/raw/ymzAxjeU

Thanks for the report.
Did you debug what exactly caused this bug?
Are you planning to work on the fix?

> Verifier's log: https://pastebin.com/raw/thZDTFJc

log is trimmed.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bpf: incorrect passing infinate loop causing rcu detected stall during bpf_prog_run()
  2023-10-29  0:34 ` Alexei Starovoitov
@ 2023-10-30 10:29   ` Hao Sun
  2023-10-30 18:16     ` Eduard Zingerman
  0 siblings, 1 reply; 4+ messages in thread
From: Hao Sun @ 2023-10-30 10:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan, bpf, Linux Kernel Mailing List

On Sun, Oct 29, 2023 at 2:35 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Oct 27, 2023 at 2:09 AM Hao Sun <sunhao.th@gmail.com> wrote:
> >
> > Hi,
> >
> > The following C repro contains a bpf program that can cause rcu
> > stall/soft lockup during running in bpf_prog_run(). Seems the verifier
> > incorrectly passed the program with an infinite loop.
> >
> > C repro: https://pastebin.com/raw/ymzAxjeU
>
> Thanks for the report.
> Did you debug what exactly caused this bug?
> Are you planning to work on the fix?

This bug is really hard to debug. Here is a simplified view of
the original program:

loop:
0: r4 = r8
1: r1 = 0x1f
2: r8 -= -8
3: if r1 > r7 goto pc+1
4: r7 <<= r1         ; LSH r7 by 31
5: r5 = r0
6: r5 *= 2
7: if r5 < r0 goto pc+1
8: r8 s>>= 6
9: w7 &= w7       ; r7 = 0 after the first iter
10: r8 -= r7
11: r8 -= -1
12: if r4 >= 0x9 goto loop
13: exit

At runtime, r7 is updated to 0 through #4 and #9 at the first iteration,
so the following iteration will not take #3 to #4, so #3 can be ignored
after the first iteration. r0 is init by get_current_task, and r5 is always
smaller than r0 at runtime, so #7 to #8 will never run. So, the update
to r8 is only #2 and #11, which together add 9 to r8. Since r4 is set
to r8 at the start of each iteration, so it's an infinite loop at runtime.

Based on the log, the verifier keeps tracking #7 to #8 and to #9, and
at some point, the verifier prunes states and path from #7 to #9, so
it stops checking. The log is huge and hard to follow, the issue is likely
in pruning logic, but I don't have much knowledge about that part.

>
> > Verifier's log: https://pastebin.com/raw/thZDTFJc
>
> log is trimmed.

Full log: https://pastebin.com/raw/cTC8wmDH

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bpf: incorrect passing infinate loop causing rcu detected stall during bpf_prog_run()
  2023-10-30 10:29   ` Hao Sun
@ 2023-10-30 18:16     ` Eduard Zingerman
  0 siblings, 0 replies; 4+ messages in thread
From: Eduard Zingerman @ 2023-10-30 18:16 UTC (permalink / raw)
  To: Hao Sun, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan, bpf, Linux Kernel Mailing List

On Mon, 2023-10-30 at 11:29 +0100, Hao Sun wrote:
> On Sun, Oct 29, 2023 at 2:35 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > 
> > On Fri, Oct 27, 2023 at 2:09 AM Hao Sun <sunhao.th@gmail.com> wrote:
> > > 
> > > Hi,
> > > 
> > > The following C repro contains a bpf program that can cause rcu
> > > stall/soft lockup during running in bpf_prog_run(). Seems the verifier
> > > incorrectly passed the program with an infinite loop.
> > > 
> > > C repro: https://pastebin.com/raw/ymzAxjeU
> > 
> > Thanks for the report.
> > Did you debug what exactly caused this bug?
> > Are you planning to work on the fix?
> 
> This bug is really hard to debug. Here is a simplified view of
> the original program:
> 
> loop:
> 0: r4 = r8
> 1: r1 = 0x1f
> 2: r8 -= -8
> 3: if r1 > r7 goto pc+1
> 4: r7 <<= r1         ; LSH r7 by 31
> 5: r5 = r0
> 6: r5 *= 2
> 7: if r5 < r0 goto pc+1
> 8: r8 s>>= 6
> 9: w7 &= w7       ; r7 = 0 after the first iter
> 10: r8 -= r7
> 11: r8 -= -1
> 12: if r4 >= 0x9 goto loop
> 13: exit
> 
> At runtime, r7 is updated to 0 through #4 and #9 at the first iteration,
> so the following iteration will not take #3 to #4, so #3 can be ignored
> after the first iteration. r0 is init by get_current_task, and r5 is always
> smaller than r0 at runtime, so #7 to #8 will never run. So, the update
> to r8 is only #2 and #11, which together add 9 to r8. Since r4 is set
> to r8 at the start of each iteration, so it's an infinite loop at runtime.
> 
> Based on the log, the verifier keeps tracking #7 to #8 and to #9, and
> at some point, the verifier prunes states and path from #7 to #9, so
> it stops checking. The log is huge and hard to follow, the issue is likely
> in pruning logic, but I don't have much knowledge about that part.

I can take a look at this issue but closer to the end of the week (Thu/Fri).

> 
> > 
> > > Verifier's log: https://pastebin.com/raw/thZDTFJc
> > 
> > log is trimmed.
> 
> Full log: https://pastebin.com/raw/cTC8wmDH
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-10-30 18:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-27  9:08 bpf: incorrect passing infinate loop causing rcu detected stall during bpf_prog_run() Hao Sun
2023-10-29  0:34 ` Alexei Starovoitov
2023-10-30 10:29   ` Hao Sun
2023-10-30 18:16     ` Eduard Zingerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox