[syzbot] [bpf?] WARNING in reg_bounds_sanity

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
@ 2025-07-02  1:55 syzbot
  2025-07-03 17:02 ` Paul Chaignon
  2025-07-05 16:02 ` syzbot
  0 siblings, 2 replies; 17+ messages in thread
From: syzbot @ 2025-07-02  1:55 UTC (permalink / raw)
  To: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
	kpsingh, linux-kernel, martin.lau, netdev, sdf, song,
	syzkaller-bugs, yonghong.song

Hello,

syzbot found the following issue on:

HEAD commit:    cce3fee729ee selftests/bpf: Enable dynptr/test_probe_read_..
git tree:       bpf-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=147793d4580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=79da270cec5ffd65
dashboard link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf
compiler:       Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1594e48c580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1159388c580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/f286a7ef4940/disk-cce3fee7.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/e2f2ebe1fdc3/vmlinux-cce3fee7.xz
kernel image: https://storage.googleapis.com/syzbot-assets/6e3070663778/bzImage-cce3fee7.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+c711ce17dd78e5d4fdcf@syzkaller.appspotmail.com

------------[ cut here ]------------
verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0x0, 0x0] s64=[0x0, 0x0] u32=[0x1, 0x0] s32=[0x0, 0x0] var_off=(0x0, 0x0)(1)
WARNING: CPU: 1 PID: 5833 at kernel/bpf/verifier.c:2688 reg_bounds_sanity_check+0x6e6/0xc20 kernel/bpf/verifier.c:2682
Modules linked in:
CPU: 1 UID: 0 PID: 5833 Comm: syz-executor346 Not tainted 6.16.0-rc3-syzkaller-gcce3fee729ee #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
RIP: 0010:reg_bounds_sanity_check+0x6e6/0xc20 kernel/bpf/verifier.c:2682
Code: 24 20 4c 8b 44 24 60 4c 8b 4c 24 58 41 ff 75 00 53 41 57 55 ff 74 24 38 ff 74 24 70 ff 74 24 40 e8 8f 86 aa ff 48 83 c4 38 90 <0f> 0b 90 90 48 bb 00 00 00 00 00 fc ff df 4d 89 f7 4c 8b 74 24 08
RSP: 0018:ffffc90003f6ec08 EFLAGS: 00010282
RAX: 2c2acf8a45b1bf00 RBX: 0000000000000000 RCX: ffff888029235a00
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000002
RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000004
R10: dffffc0000000000 R11: fffffbfff1bfaa04 R12: ffff888025056000
R13: ffff888025056020 R14: ffff888025056038 R15: 0000000000000000
FS:  00005555860e1380(0000) GS:ffff888125d4d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002000002a1000 CR3: 00000000749a0000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 reg_set_min_max+0x264/0x300 kernel/bpf/verifier.c:16262
 check_cond_jmp_op+0x159b/0x2910 kernel/bpf/verifier.c:16705
 do_check_insn kernel/bpf/verifier.c:19882 [inline]
 do_check+0x665b/0xe080 kernel/bpf/verifier.c:20017
 do_check_common+0x188f/0x23f0 kernel/bpf/verifier.c:23180
 do_check_main kernel/bpf/verifier.c:23263 [inline]
 bpf_check+0x10252/0x1a5d0 kernel/bpf/verifier.c:24619
 bpf_prog_load+0x1318/0x1930 kernel/bpf/syscall.c:2972
 __sys_bpf+0x5f1/0x860 kernel/bpf/syscall.c:5978
 __do_sys_bpf kernel/bpf/syscall.c:6085 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:6083 [inline]
 __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:6083
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f37dffe23a9
Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff6bb80468 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 00007fff6bb80648 RCX: 00007f37dffe23a9
RDX: 0000000000000045 RSI: 00002000002a0fb8 RDI: 0000000000000005
RBP: 00007f37e0055610 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000001
R13: 00007fff6bb80638 R14: 0000000000000001 R15: 0000000000000001
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-02  1:55 [syzbot] [bpf?] WARNING in reg_bounds_sanity_check syzbot
@ 2025-07-03 17:02 ` Paul Chaignon
  2025-07-03 18:54   ` Eduard Zingerman
  2025-07-05 16:02 ` syzbot
  1 sibling, 1 reply; 17+ messages in thread
From: Paul Chaignon @ 2025-07-03 17:02 UTC (permalink / raw)
  To: syzbot
  Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
	kpsingh, linux-kernel, martin.lau, netdev, sdf, song,
	syzkaller-bugs, yonghong.song

On Tue, Jul 01, 2025 at 06:55:28PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    cce3fee729ee selftests/bpf: Enable dynptr/test_probe_read_..
> git tree:       bpf-next
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=147793d4580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=79da270cec5ffd65
> dashboard link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf
> compiler:       Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1594e48c580000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1159388c580000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/f286a7ef4940/disk-cce3fee7.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/e2f2ebe1fdc3/vmlinux-cce3fee7.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/6e3070663778/bzImage-cce3fee7.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+c711ce17dd78e5d4fdcf@syzkaller.appspotmail.com
> 
> ------------[ cut here ]------------
> verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0x0, 0x0] s64=[0x0, 0x0] u32=[0x1, 0x0] s32=[0x0, 0x0] var_off=(0x0, 0x0)(1)
> WARNING: CPU: 1 PID: 5833 at kernel/bpf/verifier.c:2688 reg_bounds_sanity_check+0x6e6/0xc20 kernel/bpf/verifier.c:2682

I'm unsure how to handle this one.

One example repro is as follows.

  0: call bpf_get_netns_cookie
  1: if r0 == 0 goto <exit>
  2: if r0 & Oxffffffff goto <exit>

The issue is on the path where we fall through both jumps.

That path is unreachable at runtime: after insn 1, we know r0 != 0, but
with the sign extension on the jset, we would only fallthrough insn 2
if r0 == 0. Unfortunately, is_branch_taken() isn't currently able to
figure this out, so the verifier walks all branches. As a result, we end
up with inconsistent register ranges on this unreachable path:

  0: if r0 == 0 goto <exit>
    r0: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0xffffffffffffffff)
  1: if r0 & 0xffffffff goto <exit>
    r0 before reg_bounds_sync: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0)
    r0 after reg_bounds_sync:  u64=[0x1, 0] var_off=(0, 0)

I suspect there isn't anything specific to these two conditions, and
anytime we start walking an unreachable path, we may end up with
inconsistent register ranges. The number of times syzkaller is currently
hitting this (180 in 1.5 days) suggests there are many different ways to
reproduce.

We could teach is_branch_taken() about this case, but we probably won't
be able to cover all cases. We could stop warning on this, but then we
may also miss legitimate cases (i.e., invariants violations on reachable
paths). We could also teach reg_bounds_sync() to stop refining the
bounds before it gets inconsistent, but I'm unsure how useful that'd be.

Paul

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-03 17:02 ` Paul Chaignon
@ 2025-07-03 18:54   ` Eduard Zingerman
  2025-07-04 17:14     ` Paul Chaignon
  0 siblings, 1 reply; 17+ messages in thread
From: Eduard Zingerman @ 2025-07-03 18:54 UTC (permalink / raw)
  To: Paul Chaignon, syzbot
  Cc: andrii, ast, bpf, daniel, haoluo, john.fastabend, jolsa, kpsingh,
	linux-kernel, martin.lau, netdev, sdf, song, syzkaller-bugs,
	yonghong.song

On Thu, 2025-07-03 at 19:02 +0200, Paul Chaignon wrote:
> On Tue, Jul 01, 2025 at 06:55:28PM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    cce3fee729ee selftests/bpf: Enable dynptr/test_probe_read_..
> > git tree:       bpf-next
> > console+strace: https://syzkaller.appspot.com/x/log.txt?x=147793d4580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=79da270cec5ffd65
> > dashboard link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf
> > compiler:       Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1594e48c580000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1159388c580000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/f286a7ef4940/disk-cce3fee7.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/e2f2ebe1fdc3/vmlinux-cce3fee7.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/6e3070663778/bzImage-cce3fee7.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+c711ce17dd78e5d4fdcf@syzkaller.appspotmail.com
> >
> > ------------[ cut here ]------------
> > verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0x0, 0x0] s64=[0x0, 0x0] u32=[0x1, 0x0] s32=[0x0, 0x0] var_off=(0x0, 0x0)(1)
> > WARNING: CPU: 1 PID: 5833 at kernel/bpf/verifier.c:2688 reg_bounds_sanity_check+0x6e6/0xc20 kernel/bpf/verifier.c:2682
>
> I'm unsure how to handle this one.
>
> One example repro is as follows.
>
>   0: call bpf_get_netns_cookie
>   1: if r0 == 0 goto <exit>
>   2: if r0 & Oxffffffff goto <exit>
>
> The issue is on the path where we fall through both jumps.
>
> That path is unreachable at runtime: after insn 1, we know r0 != 0, but
> with the sign extension on the jset, we would only fallthrough insn 2
> if r0 == 0. Unfortunately, is_branch_taken() isn't currently able to
> figure this out, so the verifier walks all branches. As a result, we end
> up with inconsistent register ranges on this unreachable path:
>
>   0: if r0 == 0 goto <exit>
>     r0: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0xffffffffffffffff)
>   1: if r0 & 0xffffffff goto <exit>
>     r0 before reg_bounds_sync: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0)
>     r0 after reg_bounds_sync:  u64=[0x1, 0] var_off=(0, 0)
>
> I suspect there isn't anything specific to these two conditions, and
> anytime we start walking an unreachable path, we may end up with
> inconsistent register ranges. The number of times syzkaller is currently
> hitting this (180 in 1.5 days) suggests there are many different ways to
> reproduce.
>
> We could teach is_branch_taken() about this case, but we probably won't
> be able to cover all cases. We could stop warning on this, but then we
> may also miss legitimate cases (i.e., invariants violations on reachable
> paths). We could also teach reg_bounds_sync() to stop refining the
> bounds before it gets inconsistent, but I'm unsure how useful that'd be.

Hi Paul,

In general, I think that reg_bounds_sync() can be used as a substitute
for is_branch_taken() -> whenever an impossible range is produced,
the branch should be deemed impossible at runtime and abandoned.
If I recall correctly Andrii considered this too risky some time ago,
so this warning is in place to catch bugs.

Which leaves only the option to refine is_branch_taken().

I think is_branch_taken() modification should not be too complicated.
For JSET it only checks tnum, but does not take ranges into account.
Reasoning about ranges is something along the lines:
- for unsigned range a = b & CONST -> a is in [b_min & CONST, b_max & CONST];
- for signed ranged same thing, but consider two unsigned sub-ranges;
- for non CONST cases, I think same reasoning can apply, but more
  min/max combinations need to be explored.
- then check if zero is a member or 'a' range.

Wdyt?

> The number of times syzkaller is currently hitting this (180 in 1.5
> days) suggests there are many different ways to reproduce.

It is a bit inconvenient to read syzbot BPF reports at the moment,
because it us hard to figure out how the program looks like.
Do you happen to know how complicated would it be to modify syzbot
output to:
- produce a comment with BPF program
- generating reproducer with a flag, allowing to print level 2
  verifier log
?

Thanks,
Eduard

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-03 18:54   ` Eduard Zingerman
@ 2025-07-04 17:14     ` Paul Chaignon
  2025-07-04 17:26       ` Eduard Zingerman
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Chaignon @ 2025-07-04 17:14 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: syzbot, andrii, ast, bpf, daniel, haoluo, john.fastabend, jolsa,
	kpsingh, linux-kernel, martin.lau, netdev, sdf, song,
	syzkaller-bugs, yonghong.song

On Thu, Jul 03, 2025 at 11:54:27AM -0700, Eduard Zingerman wrote:
> On Thu, 2025-07-03 at 19:02 +0200, Paul Chaignon wrote:
> > On Tue, Jul 01, 2025 at 06:55:28PM -0700, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit:    cce3fee729ee selftests/bpf: Enable dynptr/test_probe_read_..
> > > git tree:       bpf-next
> > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=147793d4580000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=79da270cec5ffd65
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf
> > > compiler:       Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6
> > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1594e48c580000
> > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1159388c580000
> > >
> > > Downloadable assets:
> > > disk image: https://storage.googleapis.com/syzbot-assets/f286a7ef4940/disk-cce3fee7.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/e2f2ebe1fdc3/vmlinux-cce3fee7.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/6e3070663778/bzImage-cce3fee7.xz
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+c711ce17dd78e5d4fdcf@syzkaller.appspotmail.com
> > >
> > > ------------[ cut here ]------------
> > > verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0x0, 0x0] s64=[0x0, 0x0] u32=[0x1, 0x0] s32=[0x0, 0x0] var_off=(0x0, 0x0)(1)
> > > WARNING: CPU: 1 PID: 5833 at kernel/bpf/verifier.c:2688 reg_bounds_sanity_check+0x6e6/0xc20 kernel/bpf/verifier.c:2682
> >
> > I'm unsure how to handle this one.
> >
> > One example repro is as follows.
> >
> >   0: call bpf_get_netns_cookie
> >   1: if r0 == 0 goto <exit>
> >   2: if r0 & Oxffffffff goto <exit>
> >
> > The issue is on the path where we fall through both jumps.
> >
> > That path is unreachable at runtime: after insn 1, we know r0 != 0, but
> > with the sign extension on the jset, we would only fallthrough insn 2
> > if r0 == 0. Unfortunately, is_branch_taken() isn't currently able to
> > figure this out, so the verifier walks all branches. As a result, we end
> > up with inconsistent register ranges on this unreachable path:
> >
> >   0: if r0 == 0 goto <exit>
> >     r0: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0xffffffffffffffff)
> >   1: if r0 & 0xffffffff goto <exit>
> >     r0 before reg_bounds_sync: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0)
> >     r0 after reg_bounds_sync:  u64=[0x1, 0] var_off=(0, 0)
> >
> > I suspect there isn't anything specific to these two conditions, and
> > anytime we start walking an unreachable path, we may end up with
> > inconsistent register ranges. The number of times syzkaller is currently
> > hitting this (180 in 1.5 days) suggests there are many different ways to
> > reproduce.
> >
> > We could teach is_branch_taken() about this case, but we probably won't
> > be able to cover all cases. We could stop warning on this, but then we
> > may also miss legitimate cases (i.e., invariants violations on reachable
> > paths). We could also teach reg_bounds_sync() to stop refining the
> > bounds before it gets inconsistent, but I'm unsure how useful that'd be.
> 
> Hi Paul,
> 
> In general, I think that reg_bounds_sync() can be used as a substitute
> for is_branch_taken() -> whenever an impossible range is produced,
> the branch should be deemed impossible at runtime and abandoned.
> If I recall correctly Andrii considered this too risky some time ago,
> so this warning is in place to catch bugs.

Hi Eduard,

Yeah, that feels risky enough that I didn't even dare mention it as a
possibility :)

> 
> Which leaves only the option to refine is_branch_taken().
> 
> I think is_branch_taken() modification should not be too complicated.
> For JSET it only checks tnum, but does not take ranges into account.
> Reasoning about ranges is something along the lines:
> - for unsigned range a = b & CONST -> a is in [b_min & CONST, b_max & CONST];
> - for signed ranged same thing, but consider two unsigned sub-ranges;
> - for non CONST cases, I think same reasoning can apply, but more
>   min/max combinations need to be explored.
> - then check if zero is a member or 'a' range.
> 
> Wdyt?

I might be missing something, but I'm not sure that works. For the
unsigned range, if we have b & 0x2 with b in [2; 10], then we'd end up
with a in [2; 2] and would conclude that the jump is never taken. But
b=8 proves us wrong.

> 
> > The number of times syzkaller is currently hitting this (180 in 1.5
> > days) suggests there are many different ways to reproduce.
> 
> It is a bit inconvenient to read syzbot BPF reports at the moment,
> because it us hard to figure out how the program looks like.
> Do you happen to know how complicated would it be to modify syzbot
> output to:
> - produce a comment with BPF program
> - generating reproducer with a flag, allowing to print level 2
>   verifier log
> ?

I have the same thought sometimes. Right now, I add verifier logs to a
syz or C reproducer to see the program. Producing the BPF program in a
comment would likely be tricky as we'd need to maintain a disassembler
in syzkaller. Adding verifier logs to reproducers that contain
bpf(PROG_LOAD) calls seems easier. Then I guess we'd get that output in
the strace or console logs of syzbot.

> 
> Thanks,
> Eduard

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-04 17:14     ` Paul Chaignon
@ 2025-07-04 17:26       ` Eduard Zingerman
  2025-07-04 21:13         ` Eduard Zingerman
  2025-07-07 21:57         ` Paul Chaignon
  0 siblings, 2 replies; 17+ messages in thread
From: Eduard Zingerman @ 2025-07-04 17:26 UTC (permalink / raw)
  To: Paul Chaignon
  Cc: syzbot, andrii, ast, bpf, daniel, haoluo, john.fastabend, jolsa,
	kpsingh, linux-kernel, martin.lau, netdev, sdf, song,
	syzkaller-bugs, yonghong.song

On Fri, 2025-07-04 at 19:14 +0200, Paul Chaignon wrote:
> On Thu, Jul 03, 2025 at 11:54:27AM -0700, Eduard Zingerman wrote:
> > On Thu, 2025-07-03 at 19:02 +0200, Paul Chaignon wrote:
> > > On Tue, Jul 01, 2025 at 06:55:28PM -0700, syzbot wrote:
> > > > Hello,
> > > > 
> > > > syzbot found the following issue on:
> > > > 
> > > > HEAD commit:    cce3fee729ee selftests/bpf: Enable dynptr/test_probe_read_..
> > > > git tree:       bpf-next
> > > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=147793d4580000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=79da270cec5ffd65
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf
> > > > compiler:       Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6
> > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1594e48c580000
> > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1159388c580000
> > > > 
> > > > Downloadable assets:
> > > > disk image: https://storage.googleapis.com/syzbot-assets/f286a7ef4940/disk-cce3fee7.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/e2f2ebe1fdc3/vmlinux-cce3fee7.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/6e3070663778/bzImage-cce3fee7.xz
> > > > 
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+c711ce17dd78e5d4fdcf@syzkaller.appspotmail.com
> > > > 
> > > > ------------[ cut here ]------------
> > > > verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0x0, 0x0] s64=[0x0, 0x0] u32=[0x1, 0x0] s32=[0x0, 0x0] var_off=(0x0, 0x0)(1)
> > > > WARNING: CPU: 1 PID: 5833 at kernel/bpf/verifier.c:2688 reg_bounds_sanity_check+0x6e6/0xc20 kernel/bpf/verifier.c:2682
> > > 
> > > I'm unsure how to handle this one.
> > > 
> > > One example repro is as follows.
> > > 
> > >   0: call bpf_get_netns_cookie
> > >   1: if r0 == 0 goto <exit>
> > >   2: if r0 & Oxffffffff goto <exit>
> > > 
> > > The issue is on the path where we fall through both jumps.
> > > 
> > > That path is unreachable at runtime: after insn 1, we know r0 != 0, but
> > > with the sign extension on the jset, we would only fallthrough insn 2
> > > if r0 == 0. Unfortunately, is_branch_taken() isn't currently able to
> > > figure this out, so the verifier walks all branches. As a result, we end
> > > up with inconsistent register ranges on this unreachable path:
> > > 
> > >   0: if r0 == 0 goto <exit>
> > >     r0: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0xffffffffffffffff)
> > >   1: if r0 & 0xffffffff goto <exit>
> > >     r0 before reg_bounds_sync: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0)
> > >     r0 after reg_bounds_sync:  u64=[0x1, 0] var_off=(0, 0)
> > > 
> > > I suspect there isn't anything specific to these two conditions, and
> > > anytime we start walking an unreachable path, we may end up with
> > > inconsistent register ranges. The number of times syzkaller is currently
> > > hitting this (180 in 1.5 days) suggests there are many different ways to
> > > reproduce.
> > > 
> > > We could teach is_branch_taken() about this case, but we probably won't
> > > be able to cover all cases. We could stop warning on this, but then we
> > > may also miss legitimate cases (i.e., invariants violations on reachable
> > > paths). We could also teach reg_bounds_sync() to stop refining the
> > > bounds before it gets inconsistent, but I'm unsure how useful that'd be.
> > 
> > Hi Paul,
> > 
> > In general, I think that reg_bounds_sync() can be used as a substitute
> > for is_branch_taken() -> whenever an impossible range is produced,
> > the branch should be deemed impossible at runtime and abandoned.
> > If I recall correctly Andrii considered this too risky some time ago,
> > so this warning is in place to catch bugs.
> 
> Hi Eduard,
> 
> Yeah, that feels risky enough that I didn't even dare mention it as a
> possibility :)
> 
> > 
> > Which leaves only the option to refine is_branch_taken().
> > 
> > I think is_branch_taken() modification should not be too complicated.
> > For JSET it only checks tnum, but does not take ranges into account.
> > Reasoning about ranges is something along the lines:
> > - for unsigned range a = b & CONST -> a is in [b_min & CONST, b_max & CONST];
> > - for signed ranged same thing, but consider two unsigned sub-ranges;
> > - for non CONST cases, I think same reasoning can apply, but more
> >   min/max combinations need to be explored.
> > - then check if zero is a member or 'a' range.
> > 
> > Wdyt?
> 
> I might be missing something, but I'm not sure that works. For the
> unsigned range, if we have b & 0x2 with b in [2; 10], then we'd end up
> with a in [2; 2] and would conclude that the jump is never taken. But
> b=8 proves us wrong.

I see, what is really needed is an 'or' joined mask of all 'b' values.
I need to think how that can be obtained (or approximated).

> > 
> > > The number of times syzkaller is currently hitting this (180 in 1.5
> > > days) suggests there are many different ways to reproduce.
> > 
> > It is a bit inconvenient to read syzbot BPF reports at the moment,
> > because it us hard to figure out how the program looks like.
> > Do you happen to know how complicated would it be to modify syzbot
> > output to:
> > - produce a comment with BPF program
> > - generating reproducer with a flag, allowing to print level 2
> >   verifier log
> > ?
> 
> I have the same thought sometimes. Right now, I add verifier logs to a
> syz or C reproducer to see the program. Producing the BPF program in a
> comment would likely be tricky as we'd need to maintain a disassembler
> in syzkaller.

So, it operates on raw bytes, not on logical instructions?

> Adding verifier logs to reproducers that contain bpf(PROG_LOAD)
> calls seems easier. Then I guess we'd get that output in the strace
> or console logs of syzbot.

The log level 2 might be huge, so it shouldn't be enabled by default.
But not having to modify the reproducer before investigation would be
helpful.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-04 17:26       ` Eduard Zingerman
@ 2025-07-04 21:13         ` Eduard Zingerman
  2025-07-04 21:27           ` Eduard Zingerman
  2025-07-07 22:30           ` Paul Chaignon
  2025-07-07 21:57         ` Paul Chaignon
  1 sibling, 2 replies; 17+ messages in thread
From: Eduard Zingerman @ 2025-07-04 21:13 UTC (permalink / raw)
  To: Paul Chaignon
  Cc: syzbot, andrii, ast, bpf, daniel, haoluo, john.fastabend, jolsa,
	kpsingh, linux-kernel, martin.lau, netdev, sdf, song,
	syzkaller-bugs, yonghong.song

On Fri, 2025-07-04 at 10:26 -0700, Eduard Zingerman wrote:
> On Fri, 2025-07-04 at 19:14 +0200, Paul Chaignon wrote:
> > On Thu, Jul 03, 2025 at 11:54:27AM -0700, Eduard Zingerman wrote:

[...]

> > > I think is_branch_taken() modification should not be too complicated.
> > > For JSET it only checks tnum, but does not take ranges into account.
> > > Reasoning about ranges is something along the lines:
> > > - for unsigned range a = b & CONST -> a is in [b_min & CONST, b_max & CONST];
> > > - for signed ranged same thing, but consider two unsigned sub-ranges;
> > > - for non CONST cases, I think same reasoning can apply, but more
> > >   min/max combinations need to be explored.
> > > - then check if zero is a member or 'a' range.
> > > 
> > > Wdyt?
> > 
> > I might be missing something, but I'm not sure that works. For the
> > unsigned range, if we have b & 0x2 with b in [2; 10], then we'd end up
> > with a in [2; 2] and would conclude that the jump is never taken. But
> > b=8 proves us wrong.
> 
> I see, what is really needed is an 'or' joined mask of all 'b' values.
> I need to think how that can be obtained (or approximated).

I think the mask can be computed as in or_range() function at the
bottom of the email. This gives the following algorithm, if only
unsigned range is considered:

- assume prediction is needed for "if a & b goto ..."
- bits that may be set in 'a' are or_range(a_min, a_max)
- bits that may be set in 'b' are or_range(b_min, b_max)
- if computed bit masks intersect: both branches are possible
- otherwise only false branch is possible.

Wdyt?

[...]

---

#include <stdint.h>
#include <stdio.h>

static uint64_t or_range(uint64_t lo, uint64_t hi)
{
  uint64_t m;
  uint32_t i;

  m = hi;
  i = 0;
  while (lo != hi) {
    m |= 1lu << i;
    lo >>= 1;
    hi >>= 1;
    i++;
  }
  return m;
}

static uint64_t or_range_simple(uint64_t lo, uint64_t hi)
{
  uint64_t m = 0;
  uint64_t v = 0;

  for (v = lo; v <= hi; v++)
    m |= v;
  return m;
}

int main(int argc, char *argv[])
{
  int max = 0x1000;
  for (int lo = 0; lo < max; lo++) {
    for (int hi = lo; hi < max; hi++) {
      uint64_t expected = or_range_simple(lo, hi);
      uint64_t result = or_range(lo, hi);

      if (expected != result) {
        printf("mismatch: %x..%x -> expecting %lx, result %lx\n",
               lo, hi, expected, result);
        return 1;
      }
    }
  }
  printf("all ok\n");
  return 0;
}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-04 21:13         ` Eduard Zingerman
@ 2025-07-04 21:27           ` Eduard Zingerman
  2025-07-07 22:30           ` Paul Chaignon
  1 sibling, 0 replies; 17+ messages in thread
From: Eduard Zingerman @ 2025-07-04 21:27 UTC (permalink / raw)
  To: Paul Chaignon
  Cc: syzbot, andrii, ast, bpf, daniel, haoluo, john.fastabend, jolsa,
	kpsingh, linux-kernel, martin.lau, netdev, sdf, song,
	syzkaller-bugs, yonghong.song

On Fri, 2025-07-04 at 14:13 -0700, Eduard Zingerman wrote:

[...]

> I think the mask can be computed as in or_range() function at the
> bottom of the email. This gives the following algorithm, if only
> unsigned range is considered:
> 
> - assume prediction is needed for "if a & b goto ..."
> - bits that may be set in 'a' are or_range(a_min, a_max)
> - bits that may be set in 'b' are or_range(b_min, b_max)
> - if computed bit masks intersect: both branches are possible
> - otherwise only false branch is possible.
> 
> Wdyt?

This does not help with problem at hand, however.
Possible bits set for range [0x1, 0xff] are 0xff,
so there would be no prediction.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-02  1:55 [syzbot] [bpf?] WARNING in reg_bounds_sanity_check syzbot
  2025-07-03 17:02 ` Paul Chaignon
@ 2025-07-05 16:02 ` syzbot
  1 sibling, 0 replies; 17+ messages in thread
From: syzbot @ 2025-07-05 16:02 UTC (permalink / raw)
  To: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
	kpsingh, linux-kernel, martin.lau, netdev, paul.chaignon, sdf,
	song, syzkaller-bugs, yonghong.song

syzbot has bisected this issue to:

commit 0df1a55afa832f463f9ad68ddc5de92230f1bc8a
Author: Paul Chaignon <paul.chaignon@gmail.com>
Date:   Tue Jul 1 18:36:15 2025 +0000

    bpf: Warn on internal verifier errors

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1693cf70580000
start commit:   cce3fee729ee selftests/bpf: Enable dynptr/test_probe_read_..
git tree:       bpf-next
final oops:     https://syzkaller.appspot.com/x/report.txt?x=1593cf70580000
console output: https://syzkaller.appspot.com/x/log.txt?x=1193cf70580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=79da270cec5ffd65
dashboard link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1594e48c580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1159388c580000

Reported-by: syzbot+c711ce17dd78e5d4fdcf@syzkaller.appspotmail.com
Fixes: 0df1a55afa83 ("bpf: Warn on internal verifier errors")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-04 17:26       ` Eduard Zingerman
  2025-07-04 21:13         ` Eduard Zingerman
@ 2025-07-07 21:57         ` Paul Chaignon
  2025-07-07 23:36           ` Eduard Zingerman
  1 sibling, 1 reply; 17+ messages in thread
From: Paul Chaignon @ 2025-07-07 21:57 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: syzbot, andrii, ast, bpf, daniel, haoluo, john.fastabend, jolsa,
	kpsingh, linux-kernel, martin.lau, netdev, sdf, song,
	syzkaller-bugs, yonghong.song

On Fri, Jul 04, 2025 at 10:26:14AM -0700, Eduard Zingerman wrote:
> On Fri, 2025-07-04 at 19:14 +0200, Paul Chaignon wrote:
> > On Thu, Jul 03, 2025 at 11:54:27AM -0700, Eduard Zingerman wrote:
> > > On Thu, 2025-07-03 at 19:02 +0200, Paul Chaignon wrote:
> > > > The number of times syzkaller is currently hitting this (180 in 1.5
> > > > days) suggests there are many different ways to reproduce.
> > > 
> > > It is a bit inconvenient to read syzbot BPF reports at the moment,
> > > because it us hard to figure out how the program looks like.
> > > Do you happen to know how complicated would it be to modify syzbot
> > > output to:
> > > - produce a comment with BPF program
> > > - generating reproducer with a flag, allowing to print level 2
> > >   verifier log
> > > ?
> > 
> > I have the same thought sometimes. Right now, I add verifier logs to a
> > syz or C reproducer to see the program. Producing the BPF program in a
> > comment would likely be tricky as we'd need to maintain a disassembler
> > in syzkaller.
> 
> So, it operates on raw bytes, not on logical instructions?

Both I would say. The syzkaller descriptions for BPF are structured
around instructions [1], though they may not always match 1:1 with
upstream instructions. Syzkaller then mutates raw bytes, taking some
information from the descriptions into account (ex. known flag values).

1 - https://github.com/google/syzkaller/blob/master/sys/linux/bpf_prog.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-04 21:13         ` Eduard Zingerman
  2025-07-04 21:27           ` Eduard Zingerman
@ 2025-07-07 22:30           ` Paul Chaignon
  2025-07-07 23:29             ` Eduard Zingerman
  1 sibling, 1 reply; 17+ messages in thread
From: Paul Chaignon @ 2025-07-07 22:30 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: syzbot, andrii, ast, bpf, daniel, haoluo, john.fastabend, jolsa,
	kpsingh, linux-kernel, martin.lau, netdev, sdf, song,
	syzkaller-bugs, yonghong.song

On Fri, Jul 04, 2025 at 02:13:15PM -0700, Eduard Zingerman wrote:
> On Fri, 2025-07-04 at 10:26 -0700, Eduard Zingerman wrote:
> > On Fri, 2025-07-04 at 19:14 +0200, Paul Chaignon wrote:
> > > On Thu, Jul 03, 2025 at 11:54:27AM -0700, Eduard Zingerman wrote:
> 
> [...]
> 
> > > > I think is_branch_taken() modification should not be too complicated.
> > > > For JSET it only checks tnum, but does not take ranges into account.
> > > > Reasoning about ranges is something along the lines:
> > > > - for unsigned range a = b & CONST -> a is in [b_min & CONST, b_max & CONST];
> > > > - for signed ranged same thing, but consider two unsigned sub-ranges;
> > > > - for non CONST cases, I think same reasoning can apply, but more
> > > >   min/max combinations need to be explored.
> > > > - then check if zero is a member or 'a' range.
> > > > 
> > > > Wdyt?
> > > 
> > > I might be missing something, but I'm not sure that works. For the
> > > unsigned range, if we have b & 0x2 with b in [2; 10], then we'd end up
> > > with a in [2; 2] and would conclude that the jump is never taken. But
> > > b=8 proves us wrong.
> > 
> > I see, what is really needed is an 'or' joined mask of all 'b' values.
> > I need to think how that can be obtained (or approximated).
> 
> I think the mask can be computed as in or_range() function at the
> bottom of the email. This gives the following algorithm, if only
> unsigned range is considered:
> 
> - assume prediction is needed for "if a & b goto ..."
> - bits that may be set in 'a' are or_range(a_min, a_max)
> - bits that may be set in 'b' are or_range(b_min, b_max)
> - if computed bit masks intersect: both branches are possible
> - otherwise only false branch is possible.
> 
> Wdyt?

This is really nice! I think we can extend it to detect some
always-true branches as well, and thus handle the initial case reported
by syzbot.

- if a_min == 0: we don't deduce anything
- bits that may be set in 'a' are: possible_a = or_range(a_min, a_max)
- bits that are always set in 'b' are: always_b = b_value & ~b_mask
- if possible_a & always_b == possible_a: only true branch is possible
- otherwise, we can't deduce anything

For BPF_X case, we probably want to also check the reverse with
possible_b & always_a.

---

#include <stdint.h>
#include <stdio.h>
#include <stdbool.h>

static uint64_t or_range(uint64_t lo, uint64_t hi)
{
  uint64_t m;
  uint32_t i;

  m = hi;
  i = 0;
  while (lo != hi) {
    m |= 1lu << i;
    lo >>= 1;
    hi >>= 1;
    i++;
  }
  return m;
}

static bool always_matches(uint64_t lo, uint64_t hi, uint64_t mask)
{
  uint64_t possible_bits = or_range(lo, hi);
  return possible_bits & mask == possible_bits;
}

static bool always_matches_naive(uint64_t lo, uint64_t hi, uint64_t mask)
{
  uint64_t v = 0;

  for (v = lo; v <= hi; v++) {
    if (!(v & mask)) {
      return false;
    }
  }
  return true;
}

int main(int argc, char *argv[])
{
  int max = 0x300;

  for (int mask = 0; mask < max; mask++) {
    for (int lo = 1; lo < max; lo++) {
      for (int hi = lo; hi < max; hi++) {
        bool expected = always_matches_naive(lo, hi, mask);
        bool result = always_matches(lo, hi, mask);

        if (result == true && expected == false) {
          printf("mismatch: %x..%x & %x -> expecting %d, result %d\n",
                 lo, hi, mask, expected, result);
          return 1;
        }
      }
    }
  }
  printf("all ok\n");
  return 0;
}


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-07 22:30           ` Paul Chaignon
@ 2025-07-07 23:29             ` Eduard Zingerman
  2025-07-08  0:37               ` Eduard Zingerman
  0 siblings, 1 reply; 17+ messages in thread
From: Eduard Zingerman @ 2025-07-07 23:29 UTC (permalink / raw)
  To: Paul Chaignon
  Cc: syzbot, andrii, ast, bpf, daniel, haoluo, john.fastabend, jolsa,
	kpsingh, linux-kernel, martin.lau, netdev, sdf, song,
	syzkaller-bugs, yonghong.song

On Tue, 2025-07-08 at 00:30 +0200, Paul Chaignon wrote:

[...]

> This is really nice! I think we can extend it to detect some
> always-true branches as well, and thus handle the initial case reported
> by syzbot.
>
> - if a_min == 0: we don't deduce anything
> - bits that may be set in 'a' are: possible_a = or_range(a_min, a_max)
> - bits that are always set in 'b' are: always_b = b_value & ~b_mask
> - if possible_a & always_b == possible_a: only true branch is possible
> - otherwise, we can't deduce anything
>
> For BPF_X case, we probably want to also check the reverse with
> possible_b & always_a.

So, this would extend existing predictions:
- [old] always_a & always_b -> infer always true
- [old] !(possible_a & possible_b) -> infer always false
- [new] if possible_a & always_b == possible_a -> infer true
        (but make sure 0 is not in possible_a)

And it so happens, that it covers example at hand.
Note that or_range(1, (u64)-1) == (u64)-1, so maybe tnum would be
sufficient, w/o the need for or_range().

The part of the verifier that narrows the range after prediction:

  regs_refine_cond_op:

         case BPF_JSET | BPF_X: /* reverse of BPF_JSET, see rev_opcode() */
                 if (!is_reg_const(reg: reg2, subreg32: is_jmp32))
                         swap(reg1, reg2);
                 if (!is_reg_const(reg: reg2, subreg32: is_jmp32))
                         break;
                 val = reg_const_value(reg: reg2, subreg32: is_jmp32);
		 ...
                         reg1->var_off = tnum_and(a: reg1->var_off, b: tnum_const(value: ~val));
		 ...
                 break;

And after suggested change this part would be executed only if tnum
bounds can be changed by jset. So, this eliminates at-least a
sub-class of a problem.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-07 21:57         ` Paul Chaignon
@ 2025-07-07 23:36           ` Eduard Zingerman
  0 siblings, 0 replies; 17+ messages in thread
From: Eduard Zingerman @ 2025-07-07 23:36 UTC (permalink / raw)
  To: Paul Chaignon
  Cc: syzbot, andrii, ast, bpf, daniel, haoluo, john.fastabend, jolsa,
	kpsingh, linux-kernel, martin.lau, netdev, sdf, song,
	syzkaller-bugs, yonghong.song

On Mon, 2025-07-07 at 23:57 +0200, Paul Chaignon wrote:
> On Fri, Jul 04, 2025 at 10:26:14AM -0700, Eduard Zingerman wrote:
> > On Fri, 2025-07-04 at 19:14 +0200, Paul Chaignon wrote:
> > > On Thu, Jul 03, 2025 at 11:54:27AM -0700, Eduard Zingerman wrote:
> > > > On Thu, 2025-07-03 at 19:02 +0200, Paul Chaignon wrote:
> > > > > The number of times syzkaller is currently hitting this (180 in 1.5
> > > > > days) suggests there are many different ways to reproduce.
> > > > 
> > > > It is a bit inconvenient to read syzbot BPF reports at the moment,
> > > > because it us hard to figure out how the program looks like.
> > > > Do you happen to know how complicated would it be to modify syzbot
> > > > output to:
> > > > - produce a comment with BPF program
> > > > - generating reproducer with a flag, allowing to print level 2
> > > >   verifier log
> > > > ?
> > > 
> > > I have the same thought sometimes. Right now, I add verifier logs to a
> > > syz or C reproducer to see the program. Producing the BPF program in a
> > > comment would likely be tricky as we'd need to maintain a disassembler
> > > in syzkaller.
> > 
> > So, it operates on raw bytes, not on logical instructions?
> 
> Both I would say. The syzkaller descriptions for BPF are structured
> around instructions [1], though they may not always match 1:1 with
> upstream instructions. Syzkaller then mutates raw bytes, taking some
> information from the descriptions into account (ex. known flag values).
> 
> 1 - https://github.com/google/syzkaller/blob/master/sys/linux/bpf_prog.txt

I actually took a brief look at syzkaller over the weekend but got
lost tbh.  BPF disassembler is small (~400Loc in kernel/bpf/disasm.c).
I can teach myself some golang and make a copy of it in syzkaller,
but having some guidance on where to put/call this code would be of
much help. (I think that having program code in the report would be
of great help in triaging).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-07 23:29             ` Eduard Zingerman
@ 2025-07-08  0:37               ` Eduard Zingerman
  2025-07-08  0:51                 ` Alexei Starovoitov
  0 siblings, 1 reply; 17+ messages in thread
From: Eduard Zingerman @ 2025-07-08  0:37 UTC (permalink / raw)
  To: Paul Chaignon
  Cc: syzbot, andrii, ast, bpf, daniel, haoluo, john.fastabend, jolsa,
	kpsingh, linux-kernel, martin.lau, netdev, sdf, song,
	syzkaller-bugs, yonghong.song

On Mon, 2025-07-07 at 16:29 -0700, Eduard Zingerman wrote:
> On Tue, 2025-07-08 at 00:30 +0200, Paul Chaignon wrote:
>
> [...]
>
> > This is really nice! I think we can extend it to detect some
> > always-true branches as well, and thus handle the initial case reported
> > by syzbot.
> >
> > - if a_min == 0: we don't deduce anything
> > - bits that may be set in 'a' are: possible_a = or_range(a_min, a_max)
> > - bits that are always set in 'b' are: always_b = b_value & ~b_mask
> > - if possible_a & always_b == possible_a: only true branch is possible
> > - otherwise, we can't deduce anything
> >
> > For BPF_X case, we probably want to also check the reverse with
> > possible_b & always_a.
>
> So, this would extend existing predictions:
> - [old] always_a & always_b -> infer always true
> - [old] !(possible_a & possible_b) -> infer always false
> - [new] if possible_a & always_b == possible_a -> infer true
>         (but make sure 0 is not in possible_a)
>
> And it so happens, that it covers example at hand.
> Note that or_range(1, (u64)-1) == (u64)-1, so maybe tnum would be
> sufficient, w/o the need for or_range().
>
> The part of the verifier that narrows the range after prediction:
>
>   regs_refine_cond_op:
>
>          case BPF_JSET | BPF_X: /* reverse of BPF_JSET, see rev_opcode() */
>                  if (!is_reg_const(reg: reg2, subreg32: is_jmp32))
>                          swap(reg1, reg2);
>                  if (!is_reg_const(reg: reg2, subreg32: is_jmp32))
>                          break;
>                  val = reg_const_value(reg: reg2, subreg32: is_jmp32);
> 		 ...
>                          reg1->var_off = tnum_and(a: reg1->var_off, b: tnum_const(value: ~val));
> 		 ...
>                  break;
>
> And after suggested change this part would be executed only if tnum
> bounds can be changed by jset. So, this eliminates at-least a
> sub-class of a problem.

But I think the program below would still be problematic:

SEC("socket")
__success
__retval(0)
__naked void jset_bug1(void)
{
        asm volatile ("                                 \
        call %[bpf_get_prandom_u32];                    \
        if r0 < 2 goto 1f;                              \
        r0 |= 1;                                        \
        if r0 & -2 goto 1f;                             \
1:      r0 = 0;                                         \
        exit;                                           \
"       :
        : __imm(bpf_get_prandom_u32)
        : __clobber_all);
}

The possible_r0 would be changed by `if r0 & -2`, so new rule will not hit.
And the problem remains unsolved. I think we need to reset min/max
bounds in regs_refine_cond_op for JSET:
- in some cases range is more precise than tnum
- in these cases range cannot be compressed to a tnum
- predictions in jset are done for a tnum
- to avoid issues when narrowing tnum after prediction, forget the
  range.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-08  0:37               ` Eduard Zingerman
@ 2025-07-08  0:51                 ` Alexei Starovoitov
  2025-07-08  0:57                   ` Eduard Zingerman
  0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2025-07-08  0:51 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Paul Chaignon, syzbot, Andrii Nakryiko, Alexei Starovoitov, bpf,
	Daniel Borkmann, Hao Luo, John Fastabend, Jiri Olsa, KP Singh,
	LKML, Martin KaFai Lau, Network Development, Stanislav Fomichev,
	Song Liu, syzkaller-bugs, Yonghong Song

On Mon, Jul 7, 2025 at 5:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2025-07-07 at 16:29 -0700, Eduard Zingerman wrote:
> > On Tue, 2025-07-08 at 00:30 +0200, Paul Chaignon wrote:
> >
> > [...]
> >
> > > This is really nice! I think we can extend it to detect some
> > > always-true branches as well, and thus handle the initial case reported
> > > by syzbot.
> > >
> > > - if a_min == 0: we don't deduce anything
> > > - bits that may be set in 'a' are: possible_a = or_range(a_min, a_max)
> > > - bits that are always set in 'b' are: always_b = b_value & ~b_mask
> > > - if possible_a & always_b == possible_a: only true branch is possible
> > > - otherwise, we can't deduce anything
> > >
> > > For BPF_X case, we probably want to also check the reverse with
> > > possible_b & always_a.
> >
> > So, this would extend existing predictions:
> > - [old] always_a & always_b -> infer always true
> > - [old] !(possible_a & possible_b) -> infer always false
> > - [new] if possible_a & always_b == possible_a -> infer true
> >         (but make sure 0 is not in possible_a)
> >
> > And it so happens, that it covers example at hand.
> > Note that or_range(1, (u64)-1) == (u64)-1, so maybe tnum would be
> > sufficient, w/o the need for or_range().
> >
> > The part of the verifier that narrows the range after prediction:
> >
> >   regs_refine_cond_op:
> >
> >          case BPF_JSET | BPF_X: /* reverse of BPF_JSET, see rev_opcode() */
> >                  if (!is_reg_const(reg: reg2, subreg32: is_jmp32))
> >                          swap(reg1, reg2);
> >                  if (!is_reg_const(reg: reg2, subreg32: is_jmp32))
> >                          break;
> >                  val = reg_const_value(reg: reg2, subreg32: is_jmp32);
> >                ...
> >                          reg1->var_off = tnum_and(a: reg1->var_off, b: tnum_const(value: ~val));
> >                ...
> >                  break;
> >
> > And after suggested change this part would be executed only if tnum
> > bounds can be changed by jset. So, this eliminates at-least a
> > sub-class of a problem.
>
> But I think the program below would still be problematic:
>
> SEC("socket")
> __success
> __retval(0)
> __naked void jset_bug1(void)
> {
>         asm volatile ("                                 \
>         call %[bpf_get_prandom_u32];                    \
>         if r0 < 2 goto 1f;                              \
>         r0 |= 1;                                        \
>         if r0 & -2 goto 1f;                             \
> 1:      r0 = 0;                                         \
>         exit;                                           \
> "       :
>         : __imm(bpf_get_prandom_u32)
>         : __clobber_all);
> }
>
> The possible_r0 would be changed by `if r0 & -2`, so new rule will not hit.
> And the problem remains unsolved. I think we need to reset min/max
> bounds in regs_refine_cond_op for JSET:
> - in some cases range is more precise than tnum
> - in these cases range cannot be compressed to a tnum
> - predictions in jset are done for a tnum
> - to avoid issues when narrowing tnum after prediction, forget the
>   range.

You're digging too deep. llvm doesn't generate JSET insn,
so this is syzbot only issue. Let's address it with minimal changes.
Do not introduce fancy branch taken analysis.
I would be fine with reverting this particular verifier_bug() hunk.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-08  0:51                 ` Alexei Starovoitov
@ 2025-07-08  0:57                   ` Eduard Zingerman
  2025-07-08 16:19                     ` Paul Chaignon
  0 siblings, 1 reply; 17+ messages in thread
From: Eduard Zingerman @ 2025-07-08  0:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Paul Chaignon, syzbot, Andrii Nakryiko, Alexei Starovoitov, bpf,
	Daniel Borkmann, Hao Luo, John Fastabend, Jiri Olsa, KP Singh,
	LKML, Martin KaFai Lau, Network Development, Stanislav Fomichev,
	Song Liu, syzkaller-bugs, Yonghong Song

On Mon, 2025-07-07 at 17:51 -0700, Alexei Starovoitov wrote:
> On Mon, Jul 7, 2025 at 5:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Mon, 2025-07-07 at 16:29 -0700, Eduard Zingerman wrote:
> > > On Tue, 2025-07-08 at 00:30 +0200, Paul Chaignon wrote:
> > >
> > > [...]
> > >
> > > > This is really nice! I think we can extend it to detect some
> > > > always-true branches as well, and thus handle the initial case reported
> > > > by syzbot.
> > > >
> > > > - if a_min == 0: we don't deduce anything
> > > > - bits that may be set in 'a' are: possible_a = or_range(a_min, a_max)
> > > > - bits that are always set in 'b' are: always_b = b_value & ~b_mask
> > > > - if possible_a & always_b == possible_a: only true branch is possible
> > > > - otherwise, we can't deduce anything
> > > >
> > > > For BPF_X case, we probably want to also check the reverse with
> > > > possible_b & always_a.
> > >
> > > So, this would extend existing predictions:
> > > - [old] always_a & always_b -> infer always true
> > > - [old] !(possible_a & possible_b) -> infer always false
> > > - [new] if possible_a & always_b == possible_a -> infer true
> > >         (but make sure 0 is not in possible_a)
> > >
> > > And it so happens, that it covers example at hand.
> > > Note that or_range(1, (u64)-1) == (u64)-1, so maybe tnum would be
> > > sufficient, w/o the need for or_range().
> > >
> > > The part of the verifier that narrows the range after prediction:
> > >
> > >   regs_refine_cond_op:
> > >
> > >          case BPF_JSET | BPF_X: /* reverse of BPF_JSET, see rev_opcode() */
> > >                  if (!is_reg_const(reg: reg2, subreg32: is_jmp32))
> > >                          swap(reg1, reg2);
> > >                  if (!is_reg_const(reg: reg2, subreg32: is_jmp32))
> > >                          break;
> > >                  val = reg_const_value(reg: reg2, subreg32: is_jmp32);
> > >                ...
> > >                          reg1->var_off = tnum_and(a: reg1->var_off, b: tnum_const(value: ~val));
> > >                ...
> > >                  break;
> > >
> > > And after suggested change this part would be executed only if tnum
> > > bounds can be changed by jset. So, this eliminates at-least a
> > > sub-class of a problem.
> >
> > But I think the program below would still be problematic:
> >
> > SEC("socket")
> > __success
> > __retval(0)
> > __naked void jset_bug1(void)
> > {
> >         asm volatile ("                                 \
> >         call %[bpf_get_prandom_u32];                    \
> >         if r0 < 2 goto 1f;                              \
> >         r0 |= 1;                                        \
> >         if r0 & -2 goto 1f;                             \
> > 1:      r0 = 0;                                         \
> >         exit;                                           \
> > "       :
> >         : __imm(bpf_get_prandom_u32)
> >         : __clobber_all);
> > }
> >
> > The possible_r0 would be changed by `if r0 & -2`, so new rule will not hit.
> > And the problem remains unsolved. I think we need to reset min/max
> > bounds in regs_refine_cond_op for JSET:
> > - in some cases range is more precise than tnum
> > - in these cases range cannot be compressed to a tnum
> > - predictions in jset are done for a tnum
> > - to avoid issues when narrowing tnum after prediction, forget the
> >   range.
>
> You're digging too deep. llvm doesn't generate JSET insn,
> so this is syzbot only issue. Let's address it with minimal changes.
> Do not introduce fancy branch taken analysis.
> I would be fine with reverting this particular verifier_bug() hunk.

My point is that the fix should look as below (but extract it as a
utility function):

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 53007182b46b..b2fe665901b7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -16207,6 +16207,14 @@ static void regs_refine_cond_op(struct bpf_reg_state *reg1, struct bpf_reg_state
                        swap(reg1, reg2);
                if (!is_reg_const(reg2, is_jmp32))
                        break;
+               reg1->u32_max_value = U32_MAX;
+               reg1->u32_min_value = 0;
+               reg1->s32_max_value = S32_MAX;
+               reg1->s32_min_value = S32_MIN;
+               reg1->umax_value = U64_MAX;
+               reg1->umin_value = 0;
+               reg1->smax_value = S64_MAX;
+               reg1->smin_value = S32_MIN;
                val = reg_const_value(reg2, is_jmp32);
                if (is_jmp32) {
                        t = tnum_and(tnum_subreg(reg1->var_off), tnum_const(~val));

----

Because of irreconcilable differences in what can be represented as a
tnum and what can be represented as a range.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-08  0:57                   ` Eduard Zingerman
@ 2025-07-08 16:19                     ` Paul Chaignon
  2025-07-08 17:39                       ` Eduard Zingerman
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Chaignon @ 2025-07-08 16:19 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, syzbot, Andrii Nakryiko, Alexei Starovoitov,
	bpf, Daniel Borkmann, Hao Luo, John Fastabend, Jiri Olsa,
	KP Singh, LKML, Martin KaFai Lau, Network Development,
	Stanislav Fomichev, Song Liu, syzkaller-bugs, Yonghong Song

On Mon, Jul 07, 2025 at 05:57:32PM -0700, Eduard Zingerman wrote:
> On Mon, 2025-07-07 at 17:51 -0700, Alexei Starovoitov wrote:
> > On Mon, Jul 7, 2025 at 5:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Mon, 2025-07-07 at 16:29 -0700, Eduard Zingerman wrote:
> > > > On Tue, 2025-07-08 at 00:30 +0200, Paul Chaignon wrote:

[...]

> > > But I think the program below would still be problematic:
> > >
> > > SEC("socket")
> > > __success
> > > __retval(0)
> > > __naked void jset_bug1(void)
> > > {
> > >         asm volatile ("                                 \
> > >         call %[bpf_get_prandom_u32];                    \
> > >         if r0 < 2 goto 1f;                              \
> > >         r0 |= 1;                                        \
> > >         if r0 & -2 goto 1f;                             \
> > > 1:      r0 = 0;                                         \
> > >         exit;                                           \
> > > "       :
> > >         : __imm(bpf_get_prandom_u32)
> > >         : __clobber_all);
> > > }
> > >
> > > The possible_r0 would be changed by `if r0 & -2`, so new rule will not hit.
> > > And the problem remains unsolved. I think we need to reset min/max
> > > bounds in regs_refine_cond_op for JSET:
> > > - in some cases range is more precise than tnum
> > > - in these cases range cannot be compressed to a tnum
> > > - predictions in jset are done for a tnum
> > > - to avoid issues when narrowing tnum after prediction, forget the
> > >   range.
> >
> > You're digging too deep. llvm doesn't generate JSET insn,
> > so this is syzbot only issue. Let's address it with minimal changes.
> > Do not introduce fancy branch taken analysis.
> > I would be fine with reverting this particular verifier_bug() hunk.

Ok, if LLVM doesn't generate JSETs, I agree there's not much point
trying to reduce false positives. I like Eduard's solution below
because it handles the JSET case without removing the warning. Given
the number of crashes syzkaller is generating, I suspect this isn't
only about JSET, so it'd be good to keep some visibility into invariant
violations.

> 
> My point is that the fix should look as below (but extract it as a
> utility function):
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 53007182b46b..b2fe665901b7 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -16207,6 +16207,14 @@ static void regs_refine_cond_op(struct bpf_reg_state *reg1, struct bpf_reg_state
>                         swap(reg1, reg2);
>                 if (!is_reg_const(reg2, is_jmp32))
>                         break;
> +               reg1->u32_max_value = U32_MAX;
> +               reg1->u32_min_value = 0;
> +               reg1->s32_max_value = S32_MAX;
> +               reg1->s32_min_value = S32_MIN;
> +               reg1->umax_value = U64_MAX;
> +               reg1->umin_value = 0;
> +               reg1->smax_value = S64_MAX;
> +               reg1->smin_value = S32_MIN;

Looks like __mark_reg_unbounded :)

I can send a test case + __mark_reg_unbounded for BPF_JSET | BPF_X in
regs_refine_cond_op. I suspect we may need the same for the BPF_JSET
case as well, but I'm unable to build a repro for that so far.

>                 val = reg_const_value(reg2, is_jmp32);
>                 if (is_jmp32) {
>                         t = tnum_and(tnum_subreg(reg1->var_off), tnum_const(~val));
> 
> ----
> 
> Because of irreconcilable differences in what can be represented as a
> tnum and what can be represented as a range.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check
  2025-07-08 16:19                     ` Paul Chaignon
@ 2025-07-08 17:39                       ` Eduard Zingerman
  0 siblings, 0 replies; 17+ messages in thread
From: Eduard Zingerman @ 2025-07-08 17:39 UTC (permalink / raw)
  To: Paul Chaignon
  Cc: Alexei Starovoitov, syzbot, Andrii Nakryiko, Alexei Starovoitov,
	bpf, Daniel Borkmann, Hao Luo, John Fastabend, Jiri Olsa,
	KP Singh, LKML, Martin KaFai Lau, Network Development,
	Stanislav Fomichev, Song Liu, syzkaller-bugs, Yonghong Song

On Tue, 2025-07-08 at 18:19 +0200, Paul Chaignon wrote:
> On Mon, Jul 07, 2025 at 05:57:32PM -0700, Eduard Zingerman wrote:
> > On Mon, 2025-07-07 at 17:51 -0700, Alexei Starovoitov wrote:
> > > On Mon, Jul 7, 2025 at 5:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > > 
> > > > On Mon, 2025-07-07 at 16:29 -0700, Eduard Zingerman wrote:
> > > > > On Tue, 2025-07-08 at 00:30 +0200, Paul Chaignon wrote:
> 
> [...]
> 
> > > > But I think the program below would still be problematic:
> > > > 
> > > > SEC("socket")
> > > > __success
> > > > __retval(0)
> > > > __naked void jset_bug1(void)
> > > > {
> > > >         asm volatile ("                                 \
> > > >         call %[bpf_get_prandom_u32];                    \
> > > >         if r0 < 2 goto 1f;                              \
> > > >         r0 |= 1;                                        \
> > > >         if r0 & -2 goto 1f;                             \
> > > > 1:      r0 = 0;                                         \
> > > >         exit;                                           \
> > > > "       :
> > > >         : __imm(bpf_get_prandom_u32)
> > > >         : __clobber_all);
> > > > }
> > > > 
> > > > The possible_r0 would be changed by `if r0 & -2`, so new rule will not hit.
> > > > And the problem remains unsolved. I think we need to reset min/max
> > > > bounds in regs_refine_cond_op for JSET:
> > > > - in some cases range is more precise than tnum
> > > > - in these cases range cannot be compressed to a tnum
> > > > - predictions in jset are done for a tnum
> > > > - to avoid issues when narrowing tnum after prediction, forget the
> > > >   range.
> > > 
> > > You're digging too deep. llvm doesn't generate JSET insn,
> > > so this is syzbot only issue. Let's address it with minimal changes.
> > > Do not introduce fancy branch taken analysis.
> > > I would be fine with reverting this particular verifier_bug() hunk.
> 
> Ok, if LLVM doesn't generate JSETs, I agree there's not much point
> trying to reduce false positives. I like Eduard's solution below
> because it handles the JSET case without removing the warning. Given
> the number of crashes syzkaller is generating, I suspect this isn't
> only about JSET, so it'd be good to keep some visibility into invariant
> violations.

I suspect similar problems might be found in any place where tnum
operations are used to narrow the range. E.g. if a repro for JSET
would be found, same repro might be applicable to BPF_AND.

In general, it might be the case we should not treat out of sync
bounds as an error. Assuming that tnum and bounds based ranges have
different precision in different scale regions, situations when
one bound is changed w/o changing another can be legit. E.g.:

                              ____ bounds range ____
                             /                      \
0 --------------------------------------------------------- MAX
    \___________________________________________________/
          tnum range

Narrowing only tnum:
                              ____ bounds range ____
                             /                      \
0 --------------------------------------------------------- MAX
    \___________________/
          tnum range

This does not highlight an error, but a difference in expressive power
for specific values.

> > My point is that the fix should look as below (but extract it as a
> > utility function):
> > 
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 53007182b46b..b2fe665901b7 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -16207,6 +16207,14 @@ static void regs_refine_cond_op(struct bpf_reg_state *reg1, struct bpf_reg_state
> >                         swap(reg1, reg2);
> >                 if (!is_reg_const(reg2, is_jmp32))
> >                         break;
> > +               reg1->u32_max_value = U32_MAX;
> > +               reg1->u32_min_value = 0;
> > +               reg1->s32_max_value = S32_MAX;
> > +               reg1->s32_min_value = S32_MIN;
> > +               reg1->umax_value = U64_MAX;
> > +               reg1->umin_value = 0;
> > +               reg1->smax_value = S64_MAX;
> > +               reg1->smin_value = S32_MIN;
> 
> Looks like __mark_reg_unbounded :)

I suspected there should be something already :)

> I can send a test case + __mark_reg_unbounded for BPF_JSET | BPF_X in
> regs_refine_cond_op. I suspect we may need the same for the BPF_JSET
> case as well, but I'm unable to build a repro for that so far.

Please go ahead.

> 
> >                 val = reg_const_value(reg2, is_jmp32);
> >                 if (is_jmp32) {
> >                         t = tnum_and(tnum_subreg(reg1->var_off), tnum_const(~val));
> > 
> > ----
> > 
> > Because of irreconcilable differences in what can be represented as a
> > tnum and what can be represented as a range.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-07-08 17:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-02  1:55 [syzbot] [bpf?] WARNING in reg_bounds_sanity_check syzbot
2025-07-03 17:02 ` Paul Chaignon
2025-07-03 18:54   ` Eduard Zingerman
2025-07-04 17:14     ` Paul Chaignon
2025-07-04 17:26       ` Eduard Zingerman
2025-07-04 21:13         ` Eduard Zingerman
2025-07-04 21:27           ` Eduard Zingerman
2025-07-07 22:30           ` Paul Chaignon
2025-07-07 23:29             ` Eduard Zingerman
2025-07-08  0:37               ` Eduard Zingerman
2025-07-08  0:51                 ` Alexei Starovoitov
2025-07-08  0:57                   ` Eduard Zingerman
2025-07-08 16:19                     ` Paul Chaignon
2025-07-08 17:39                       ` Eduard Zingerman
2025-07-07 21:57         ` Paul Chaignon
2025-07-07 23:36           ` Eduard Zingerman
2025-07-05 16:02 ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).