[PATCH bpf-next v3 0/2] bpf: Recognize special arithmetic shift in the verifier

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next v3 0/2] bpf: Recognize special arithmetic shift in the verifier
@ 2026-01-03  2:23 Puranjay Mohan
  2026-01-03  2:23 ` [PATCH bpf-next v3 1/2] " Puranjay Mohan
  2026-01-03  2:23 ` [PATCH bpf-next v3 2/2] selftests/bpf: Add tests for s>>=31 and s>>=63 Puranjay Mohan
  0 siblings, 2 replies; 13+ messages in thread
From: Puranjay Mohan @ 2026-01-03  2:23 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	kernel-team, sunhao.th

v2: https://lore.kernel.org/bpf/20251115022611.64898-1-alexei.starovoitov@gmail.com/
Changes in v2->v3:
- fork verifier state while processing BPF_AND when src_reg has [-1,0]
  range and 2nd operand is a constant.

v1->v2:
Use __mark_reg32_known() or __mark_reg_known() for zero too.
Add comment to selftest.

v1:
https://lore.kernel.org/bpf/20251114031039.63852-1-alexei.starovoitov@gmail.com/

Alexei Starovoitov (2):
  bpf: Recognize special arithmetic shift in the verifier
  selftests/bpf: Add tests for s>>=31 and s>>=63

 kernel/bpf/verifier.c                         | 34 ++++++++++++++
 .../selftests/bpf/progs/verifier_subreg.c     | 45 +++++++++++++++++++
 2 files changed, 79 insertions(+)


base-commit: 7694ff8f6ca7f7cdf2e5636ecbe6dacaeb71678d
-- 
2.47.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v3 1/2] bpf: Recognize special arithmetic shift in the verifier
  2026-01-03  2:23 [PATCH bpf-next v3 0/2] bpf: Recognize special arithmetic shift in the verifier Puranjay Mohan
@ 2026-01-03  2:23 ` Puranjay Mohan
  2026-01-06 17:21   ` Andrii Nakryiko
  2026-01-08  1:03   ` Eduard Zingerman
  2026-01-03  2:23 ` [PATCH bpf-next v3 2/2] selftests/bpf: Add tests for s>>=31 and s>>=63 Puranjay Mohan
  1 sibling, 2 replies; 13+ messages in thread
From: Puranjay Mohan @ 2026-01-03  2:23 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	kernel-team, sunhao.th

From: Alexei Starovoitov <ast@kernel.org>

cilium bpf_wiregard.bpf.c when compiled with -O1 fails to load
with the following verifier log:

192: (79) r2 = *(u64 *)(r10 -304)     ; R2=pkt(r=40) R10=fp0 fp-304=pkt(r=40)
...
227: (85) call bpf_skb_store_bytes#9          ; R0=scalar()
228: (bc) w2 = w0                     ; R0=scalar() R2=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
229: (c4) w2 s>>= 31                  ; R2=scalar(smin=0,smax=umax=0xffffffff,smin32=-1,smax32=0,var_off=(0x0; 0xffffffff))
230: (54) w2 &= -134                  ; R2=scalar(smin=0,smax=umax=umax32=0xffffff7a,smax32=0x7fffff7a,var_off=(0x0; 0xffffff7a))
...
232: (66) if w2 s> 0xffffffff goto pc+125     ; R2=scalar(smin=umin=umin32=0x80000000,smax=umax=umax32=0xffffff7a,smax32=-134,var_off=(0x80000000; 0x7fffff7a))
...
238: (79) r4 = *(u64 *)(r10 -304)     ; R4=scalar() R10=fp0 fp-304=scalar()
239: (56) if w2 != 0xffffff78 goto pc+210     ; R2=0xffffff78 // -136
...
258: (71) r1 = *(u8 *)(r4 +0)
R4 invalid mem access 'scalar'

The error might confuse most bpf authors, since fp-304 slot had 'pkt'
pointer at insn 192 and became 'scalar' at 238. That happened because
bpf_skb_store_bytes() clears all packet pointers including those in
the stack. On the first glance it might look like a bug in the source
code, since ctx->data pointer should have been reloaded after the call
to bpf_skb_store_bytes().

The relevant part of cilium source code looks like this:

// bpf/lib/nodeport.h
int dsr_set_ipip6()
{
	if (ctx_adjust_hroom(...))
		return DROP_INVALID; // -134
	if (ctx_store_bytes(...))
		return DROP_WRITE_ERROR; // -141
	return 0;
}

bool dsr_fail_needs_reply(int code)
{
	if (code == DROP_FRAG_NEEDED) // -136
		return true;
	return false;
}

tail_nodeport_ipv6_dsr()
{
	ret = dsr_set_ipip6(...);
	if (!IS_ERR(ret)) {
		...
	} else {
		if (dsr_fail_needs_reply(ret))
			return dsr_reply_icmp6(...);
	}
}

The code doesn't have arithmetic shift by 31 and it reloads ctx->data
every time it needs to access it. So it's not a bug in the source code.

The reason is DAGCombiner::foldSelectCCToShiftAnd() LLVM transformation:

  // If this is a select where the false operand is zero and the compare is a
  // check of the sign bit, see if we can perform the "gzip trick":
  // select_cc setlt X, 0, A, 0 -> and (sra X, size(X)-1), A
  // select_cc setgt X, 0, A, 0 -> and (not (sra X, size(X)-1)), A

The conditional branch in dsr_set_ipip6() and its return values
are optimized into BPF_ARSH plus BPF_AND:

227: (85) call bpf_skb_store_bytes#9
228: (bc) w2 = w0
229: (c4) w2 s>>= 31   ; R2=scalar(smin=0,smax=umax=0xffffffff,smin32=-1,smax32=0,var_off=(0x0; 0xffffffff))
230: (54) w2 &= -134   ; R2=scalar(smin=0,smax=umax=umax32=0xffffff7a,smax32=0x7fffff7a,var_off=(0x0; 0xffffff7a))

after insn 230 the register w2 can only be 0 or -134,
but the verifier approximates it, since there is no way to
represent two scalars in bpf_reg_state.
After fallthough at insn 232 the w2 can only be -134,
hence the branch at insn
239: (56) if w2 != -136 goto pc+210
should be always taken, and trapping insn 258 should never execute.
LLVM generated correct code, but the verifier follows impossible
path and rejects valid program. To fix this issue recognize this
special LLVM optimization and fork the verifier state.
So after insn 229: (c4) w2 s>>= 31
the verifier has two states to explore:
one with w2 = 0 and another with w2 = 0xffffffff
which makes the verifier accept bpf_wiregard.c

Note there are 20+ such patterns in bpf_wiregard.o compiled
with -O1 and -O2, but they're rarely seen in other production
bpf programs, so push_stack() approach is not a concern.

Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 kernel/bpf/verifier.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c9da70dd3e72..6dbcfae5615b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -15490,6 +15490,35 @@ static bool is_safe_to_compute_dst_reg_range(struct bpf_insn *insn,
 	}
 }
 
+static int maybe_fork_scalars(struct bpf_verifier_env *env, struct bpf_insn *insn,
+			      struct bpf_reg_state *dst_reg)
+{
+	struct bpf_verifier_state *branch;
+	struct bpf_reg_state *regs;
+	bool alu32;
+
+	if (dst_reg->smin_value == -1 && dst_reg->smax_value == 0)
+		alu32 = false;
+	else if (dst_reg->s32_min_value == -1 && dst_reg->s32_max_value == 0)
+		alu32 = true;
+	else
+		return 0;
+
+	branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false);
+	if (IS_ERR(branch))
+		return PTR_ERR(branch);
+
+	regs = branch->frame[branch->curframe]->regs;
+	if (alu32) {
+		__mark_reg32_known(&regs[insn->dst_reg], 0);
+		__mark_reg32_known(dst_reg, -1ull);
+	} else {
+		__mark_reg_known(&regs[insn->dst_reg], 0);
+		__mark_reg_known(dst_reg, -1ull);
+	}
+	return 0;
+}
+
 /* WARNING: This function does calculations on 64-bit values, but the actual
  * execution may occur on 32-bit values. Therefore, things like bitshifts
  * need extra checks in the 32-bit case.
@@ -15552,6 +15581,11 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
 		scalar_min_max_mul(dst_reg, &src_reg);
 		break;
 	case BPF_AND:
+		if (tnum_is_const(src_reg.var_off)) {
+			ret = maybe_fork_scalars(env, insn, dst_reg);
+			if (ret)
+				return ret;
+		}
 		dst_reg->var_off = tnum_and(dst_reg->var_off, src_reg.var_off);
 		scalar32_min_max_and(dst_reg, &src_reg);
 		scalar_min_max_and(dst_reg, &src_reg);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v3 2/2] selftests/bpf: Add tests for s>>=31 and s>>=63
  2026-01-03  2:23 [PATCH bpf-next v3 0/2] bpf: Recognize special arithmetic shift in the verifier Puranjay Mohan
  2026-01-03  2:23 ` [PATCH bpf-next v3 1/2] " Puranjay Mohan
@ 2026-01-03  2:23 ` Puranjay Mohan
  1 sibling, 0 replies; 13+ messages in thread
From: Puranjay Mohan @ 2026-01-03  2:23 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	kernel-team, sunhao.th

From: Alexei Starovoitov <ast@kernel.org>

Add tests for special arithmetic shift right.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 .../selftests/bpf/progs/verifier_subreg.c     | 45 +++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/verifier_subreg.c b/tools/testing/selftests/bpf/progs/verifier_subreg.c
index b3e1c3eef9ae..531674f907fb 100644
--- a/tools/testing/selftests/bpf/progs/verifier_subreg.c
+++ b/tools/testing/selftests/bpf/progs/verifier_subreg.c
@@ -738,4 +738,49 @@ __naked void ldx_w_zero_extend_check(void)
 	: __clobber_all);
 }
 
+SEC("socket")
+__description("s>>=31")
+__success __success_unpriv __retval(0)
+__naked void arsh_31(void)
+{
+	/* Below is what LLVM generates in cilium's bpf_wiregard.o */
+	asm volatile ("					\
+	call %[bpf_get_prandom_u32];			\
+	w2 = w0;					\
+	w2 s>>= 31;					\
+	w2 &= -134; /* w2 becomes 0 or -134 */		\
+	if w2 s> -1 goto +2;				\
+	/* Branch always taken because w2 = -134 */	\
+	if w2 != -136 goto +1;				\
+	w0 /= 0;					\
+	w0 = 0;						\
+	exit;						\
+"	:
+	: __imm(bpf_get_prandom_u32)
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("s>>=63")
+__success __success_unpriv __retval(0)
+__naked void arsh_63(void)
+{
+	/* Copy of arsh_31 with s/w/r/ */
+	asm volatile ("					\
+	call %[bpf_get_prandom_u32];			\
+	r2 = r0;					\
+	r2 <<= 32;					\
+	r2 s>>= 63;					\
+	r2 &= -134;					\
+	if r2 s> -1 goto +2;				\
+	/* Branch always taken because w2 = -134 */	\
+	if r2 != -136 goto +1;				\
+	r0 /= 0;					\
+	r0 = 0;						\
+	exit;						\
+"	:
+	: __imm(bpf_get_prandom_u32)
+	: __clobber_all);
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Recognize special arithmetic shift in the verifier
  2026-01-03  2:23 ` [PATCH bpf-next v3 1/2] " Puranjay Mohan
@ 2026-01-06 17:21   ` Andrii Nakryiko
  2026-01-08 18:28     ` Puranjay Mohan
  2026-01-08  1:03   ` Eduard Zingerman
  1 sibling, 1 reply; 13+ messages in thread
From: Andrii Nakryiko @ 2026-01-06 17:21 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: bpf, Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, kernel-team, sunhao.th

On Fri, Jan 2, 2026 at 6:24 PM Puranjay Mohan <puranjay@kernel.org> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> cilium bpf_wiregard.bpf.c when compiled with -O1 fails to load
> with the following verifier log:
>
> 192: (79) r2 = *(u64 *)(r10 -304)     ; R2=pkt(r=40) R10=fp0 fp-304=pkt(r=40)
> ...
> 227: (85) call bpf_skb_store_bytes#9          ; R0=scalar()
> 228: (bc) w2 = w0                     ; R0=scalar() R2=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> 229: (c4) w2 s>>= 31                  ; R2=scalar(smin=0,smax=umax=0xffffffff,smin32=-1,smax32=0,var_off=(0x0; 0xffffffff))
> 230: (54) w2 &= -134                  ; R2=scalar(smin=0,smax=umax=umax32=0xffffff7a,smax32=0x7fffff7a,var_off=(0x0; 0xffffff7a))
> ...
> 232: (66) if w2 s> 0xffffffff goto pc+125     ; R2=scalar(smin=umin=umin32=0x80000000,smax=umax=umax32=0xffffff7a,smax32=-134,var_off=(0x80000000; 0x7fffff7a))
> ...
> 238: (79) r4 = *(u64 *)(r10 -304)     ; R4=scalar() R10=fp0 fp-304=scalar()
> 239: (56) if w2 != 0xffffff78 goto pc+210     ; R2=0xffffff78 // -136
> ...
> 258: (71) r1 = *(u8 *)(r4 +0)
> R4 invalid mem access 'scalar'
>
> The error might confuse most bpf authors, since fp-304 slot had 'pkt'
> pointer at insn 192 and became 'scalar' at 238. That happened because
> bpf_skb_store_bytes() clears all packet pointers including those in
> the stack. On the first glance it might look like a bug in the source
> code, since ctx->data pointer should have been reloaded after the call
> to bpf_skb_store_bytes().
>
> The relevant part of cilium source code looks like this:
>
> // bpf/lib/nodeport.h
> int dsr_set_ipip6()
> {
>         if (ctx_adjust_hroom(...))
>                 return DROP_INVALID; // -134
>         if (ctx_store_bytes(...))
>                 return DROP_WRITE_ERROR; // -141
>         return 0;
> }
>
> bool dsr_fail_needs_reply(int code)
> {
>         if (code == DROP_FRAG_NEEDED) // -136
>                 return true;
>         return false;
> }
>
> tail_nodeport_ipv6_dsr()
> {
>         ret = dsr_set_ipip6(...);
>         if (!IS_ERR(ret)) {
>                 ...
>         } else {
>                 if (dsr_fail_needs_reply(ret))
>                         return dsr_reply_icmp6(...);
>         }
> }
>
> The code doesn't have arithmetic shift by 31 and it reloads ctx->data
> every time it needs to access it. So it's not a bug in the source code.
>
> The reason is DAGCombiner::foldSelectCCToShiftAnd() LLVM transformation:
>
>   // If this is a select where the false operand is zero and the compare is a
>   // check of the sign bit, see if we can perform the "gzip trick":
>   // select_cc setlt X, 0, A, 0 -> and (sra X, size(X)-1), A
>   // select_cc setgt X, 0, A, 0 -> and (not (sra X, size(X)-1)), A
>
> The conditional branch in dsr_set_ipip6() and its return values
> are optimized into BPF_ARSH plus BPF_AND:
>
> 227: (85) call bpf_skb_store_bytes#9
> 228: (bc) w2 = w0
> 229: (c4) w2 s>>= 31   ; R2=scalar(smin=0,smax=umax=0xffffffff,smin32=-1,smax32=0,var_off=(0x0; 0xffffffff))
> 230: (54) w2 &= -134   ; R2=scalar(smin=0,smax=umax=umax32=0xffffff7a,smax32=0x7fffff7a,var_off=(0x0; 0xffffff7a))
>
> after insn 230 the register w2 can only be 0 or -134,
> but the verifier approximates it, since there is no way to
> represent two scalars in bpf_reg_state.
> After fallthough at insn 232 the w2 can only be -134,
> hence the branch at insn
> 239: (56) if w2 != -136 goto pc+210
> should be always taken, and trapping insn 258 should never execute.
> LLVM generated correct code, but the verifier follows impossible
> path and rejects valid program. To fix this issue recognize this
> special LLVM optimization and fork the verifier state.
> So after insn 229: (c4) w2 s>>= 31
> the verifier has two states to explore:
> one with w2 = 0 and another with w2 = 0xffffffff
> which makes the verifier accept bpf_wiregard.c
>
> Note there are 20+ such patterns in bpf_wiregard.o compiled
> with -O1 and -O2, but they're rarely seen in other production
> bpf programs, so push_stack() approach is not a concern.
>
> Reported-by: Hao Sun <sunhao.th@gmail.com>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
>  kernel/bpf/verifier.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index c9da70dd3e72..6dbcfae5615b 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -15490,6 +15490,35 @@ static bool is_safe_to_compute_dst_reg_range(struct bpf_insn *insn,
>         }
>  }
>
> +static int maybe_fork_scalars(struct bpf_verifier_env *env, struct bpf_insn *insn,
> +                             struct bpf_reg_state *dst_reg)
> +{
> +       struct bpf_verifier_state *branch;
> +       struct bpf_reg_state *regs;
> +       bool alu32;
> +
> +       if (dst_reg->smin_value == -1 && dst_reg->smax_value == 0)
> +               alu32 = false;
> +       else if (dst_reg->s32_min_value == -1 && dst_reg->s32_max_value == 0)
> +               alu32 = true;
> +       else
> +               return 0;
> +
> +       branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false);
> +       if (IS_ERR(branch))
> +               return PTR_ERR(branch);
> +
> +       regs = branch->frame[branch->curframe]->regs;
> +       if (alu32) {
> +               __mark_reg32_known(&regs[insn->dst_reg], 0);
> +               __mark_reg32_known(dst_reg, -1ull);

Did you get a chance to run veristat with these changes against
selftests, scx, maybe also Meta BPF objects? Curious if there are any
noticeable differences.

Also, the choice of which branch gets zero vs one set might have
meaningful differences (heuristically, "linear path" should preferably
explore conditions that lead to more complete and complex states), so
I'd be interested to see if and what difference does it make in
practice.

> +       } else {
> +               __mark_reg_known(&regs[insn->dst_reg], 0);
> +               __mark_reg_known(dst_reg, -1ull);
> +       }
> +       return 0;
> +}
> +
>  /* WARNING: This function does calculations on 64-bit values, but the actual
>   * execution may occur on 32-bit values. Therefore, things like bitshifts
>   * need extra checks in the 32-bit case.
> @@ -15552,6 +15581,11 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
>                 scalar_min_max_mul(dst_reg, &src_reg);
>                 break;
>         case BPF_AND:
> +               if (tnum_is_const(src_reg.var_off)) {
> +                       ret = maybe_fork_scalars(env, insn, dst_reg);
> +                       if (ret)
> +                               return ret;
> +               }
>                 dst_reg->var_off = tnum_and(dst_reg->var_off, src_reg.var_off);
>                 scalar32_min_max_and(dst_reg, &src_reg);
>                 scalar_min_max_and(dst_reg, &src_reg);
> --
> 2.47.3
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Recognize special arithmetic shift in the verifier
  2026-01-03  2:23 ` [PATCH bpf-next v3 1/2] " Puranjay Mohan
  2026-01-06 17:21   ` Andrii Nakryiko
@ 2026-01-08  1:03   ` Eduard Zingerman
  2026-01-08  1:07     ` Alexei Starovoitov
  1 sibling, 1 reply; 13+ messages in thread
From: Eduard Zingerman @ 2026-01-08  1:03 UTC (permalink / raw)
  To: Puranjay Mohan, bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Mykyta Yatsenko, kernel-team, sunhao.th

On Fri, 2026-01-02 at 18:23 -0800, Puranjay Mohan wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> cilium bpf_wiregard.bpf.c when compiled with -O1 fails to load
> with the following verifier log:
> 
> 192: (79) r2 = *(u64 *)(r10 -304)     ; R2=pkt(r=40) R10=fp0 fp-304=pkt(r=40)
> ...
> 227: (85) call bpf_skb_store_bytes#9          ; R0=scalar()
> 228: (bc) w2 = w0                     ; R0=scalar() R2=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> 229: (c4) w2 s>>= 31                  ; R2=scalar(smin=0,smax=umax=0xffffffff,smin32=-1,smax32=0,var_off=(0x0; 0xffffffff))
> 230: (54) w2 &= -134                  ; R2=scalar(smin=0,smax=umax=umax32=0xffffff7a,smax32=0x7fffff7a,var_off=(0x0; 0xffffff7a))
> ...
> 232: (66) if w2 s> 0xffffffff goto pc+125     ; R2=scalar(smin=umin=umin32=0x80000000,smax=umax=umax32=0xffffff7a,smax32=-134,var_off=(0x80000000; 0x7fffff7a))
> ...
> 238: (79) r4 = *(u64 *)(r10 -304)     ; R4=scalar() R10=fp0 fp-304=scalar()
> 239: (56) if w2 != 0xffffff78 goto pc+210     ; R2=0xffffff78 // -136
> ...
> 258: (71) r1 = *(u8 *)(r4 +0)
> R4 invalid mem access 'scalar'
> 
> The error might confuse most bpf authors, since fp-304 slot had 'pkt'
> pointer at insn 192 and became 'scalar' at 238. That happened because
> bpf_skb_store_bytes() clears all packet pointers including those in
> the stack. On the first glance it might look like a bug in the source
> code, since ctx->data pointer should have been reloaded after the call
> to bpf_skb_store_bytes().
> 
> The relevant part of cilium source code looks like this:
> 
> // bpf/lib/nodeport.h
> int dsr_set_ipip6()
> {
> 	if (ctx_adjust_hroom(...))
> 		return DROP_INVALID; // -134
> 	if (ctx_store_bytes(...))
> 		return DROP_WRITE_ERROR; // -141
> 	return 0;
> }
> 
> bool dsr_fail_needs_reply(int code)
> {
> 	if (code == DROP_FRAG_NEEDED) // -136
> 		return true;
> 	return false;
> }
> 
> tail_nodeport_ipv6_dsr()
> {
> 	ret = dsr_set_ipip6(...);
> 	if (!IS_ERR(ret)) {
> 		...
> 	} else {
> 		if (dsr_fail_needs_reply(ret))
> 			return dsr_reply_icmp6(...);
> 	}
> }
> 
> The code doesn't have arithmetic shift by 31 and it reloads ctx->data
> every time it needs to access it. So it's not a bug in the source code.
> 
> The reason is DAGCombiner::foldSelectCCToShiftAnd() LLVM transformation:
> 
>   // If this is a select where the false operand is zero and the compare is a
>   // check of the sign bit, see if we can perform the "gzip trick":
>   // select_cc setlt X, 0, A, 0 -> and (sra X, size(X)-1), A
>   // select_cc setgt X, 0, A, 0 -> and (not (sra X, size(X)-1)), A
> 
> The conditional branch in dsr_set_ipip6() and its return values
> are optimized into BPF_ARSH plus BPF_AND:
> 
> 227: (85) call bpf_skb_store_bytes#9
> 228: (bc) w2 = w0
> 229: (c4) w2 s>>= 31   ; R2=scalar(smin=0,smax=umax=0xffffffff,smin32=-1,smax32=0,var_off=(0x0; 0xffffffff))
> 230: (54) w2 &= -134   ; R2=scalar(smin=0,smax=umax=umax32=0xffffff7a,smax32=0x7fffff7a,var_off=(0x0; 0xffffff7a))
> 
> after insn 230 the register w2 can only be 0 or -134,
> but the verifier approximates it, since there is no way to
> represent two scalars in bpf_reg_state.
> After fallthough at insn 232 the w2 can only be -134,
> hence the branch at insn
> 239: (56) if w2 != -136 goto pc+210
> should be always taken, and trapping insn 258 should never execute.
> LLVM generated correct code, but the verifier follows impossible
> path and rejects valid program. To fix this issue recognize this
> special LLVM optimization and fork the verifier state.
> So after insn 229: (c4) w2 s>>= 31
> the verifier has two states to explore:
> one with w2 = 0 and another with w2 = 0xffffffff
> which makes the verifier accept bpf_wiregard.c
> 
> Note there are 20+ such patterns in bpf_wiregard.o compiled
> with -O1 and -O2, but they're rarely seen in other production
> bpf programs, so push_stack() approach is not a concern.
> 
> Reported-by: Hao Sun <sunhao.th@gmail.com>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---

Hi Puranjay,

Nit: please drop acks if patch logic changes significantly
     (e.g. in this case the maybe_fork_scalars() call was moved).

I second Andrii's question and am also curious if there is a
verification performance difference between v2 (fork at arsh) and v3
(fork at and)?

>  kernel/bpf/verifier.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index c9da70dd3e72..6dbcfae5615b 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -15490,6 +15490,35 @@ static bool is_safe_to_compute_dst_reg_range(struct bpf_insn *insn,
>  	}
>  }
>  
> +static int maybe_fork_scalars(struct bpf_verifier_env *env, struct bpf_insn *insn,
> +			      struct bpf_reg_state *dst_reg)
> +{
> +	struct bpf_verifier_state *branch;
> +	struct bpf_reg_state *regs;
> +	bool alu32;
> +
> +	if (dst_reg->smin_value == -1 && dst_reg->smax_value == 0)
> +		alu32 = false;
> +	else if (dst_reg->s32_min_value == -1 && dst_reg->s32_max_value == 0)
> +		alu32 = true;
> +	else

If we rely on specific dst_reg range, do we need to mark it as precise?

> +		return 0;
> +
> +	branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false);
> +	if (IS_ERR(branch))
> +		return PTR_ERR(branch);
> +
> +	regs = branch->frame[branch->curframe]->regs;
> +	if (alu32) {
> +		__mark_reg32_known(&regs[insn->dst_reg], 0);
> +		__mark_reg32_known(dst_reg, -1ull);
> +	} else {
> +		__mark_reg_known(&regs[insn->dst_reg], 0);
> +		__mark_reg_known(dst_reg, -1ull);
> +	}
> +	return 0;
> +}
> +
>  /* WARNING: This function does calculations on 64-bit values, but the actual
>   * execution may occur on 32-bit values. Therefore, things like bitshifts
>   * need extra checks in the 32-bit case.
> @@ -15552,6 +15581,11 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
>  		scalar_min_max_mul(dst_reg, &src_reg);
>  		break;
>  	case BPF_AND:
> +		if (tnum_is_const(src_reg.var_off)) {
> +			ret = maybe_fork_scalars(env, insn, dst_reg);
> +			if (ret)
> +				return ret;
> +		}
>  		dst_reg->var_off = tnum_and(dst_reg->var_off, src_reg.var_off);
>  		scalar32_min_max_and(dst_reg, &src_reg);
>  		scalar_min_max_and(dst_reg, &src_reg);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Recognize special arithmetic shift in the verifier
  2026-01-08  1:03   ` Eduard Zingerman
@ 2026-01-08  1:07     ` Alexei Starovoitov
  2026-01-08  1:19       ` Eduard Zingerman
  0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2026-01-08  1:07 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Puranjay Mohan, bpf, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Kernel Team, Hao Sun

On Wed, Jan 7, 2026 at 5:03 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> > +
> > +     if (dst_reg->smin_value == -1 && dst_reg->smax_value == 0)
> > +             alu32 = false;
> > +     else if (dst_reg->s32_min_value == -1 && dst_reg->s32_max_value == 0)
> > +             alu32 = true;
> > +     else
>
> If we rely on specific dst_reg range, do we need to mark it as precise?

I don't think so. We're not relying on the specific range.
Just like plain AND&5 would convert [0,10] range into 5.
It won't be marking it precise until there is a need to treat it as precise
somewhere down the verification path.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Recognize special arithmetic shift in the verifier
  2026-01-08  1:07     ` Alexei Starovoitov
@ 2026-01-08  1:19       ` Eduard Zingerman
  0 siblings, 0 replies; 13+ messages in thread
From: Eduard Zingerman @ 2026-01-08  1:19 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Puranjay Mohan, bpf, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Kernel Team, Hao Sun

On Wed, 2026-01-07 at 17:07 -0800, Alexei Starovoitov wrote:
> On Wed, Jan 7, 2026 at 5:03 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > > +
> > > +     if (dst_reg->smin_value == -1 && dst_reg->smax_value == 0)
> > > +             alu32 = false;
> > > +     else if (dst_reg->s32_min_value == -1 && dst_reg->s32_max_value == 0)
> > > +             alu32 = true;
> > > +     else
> > 
> > If we rely on specific dst_reg range, do we need to mark it as precise?
> 
> I don't think so. We're not relying on the specific range.
> Just like plain AND&5 would convert [0,10] range into 5.
> It won't be marking it precise until there is a need to treat it as precise
> somewhere down the verification path.

Hm, makes sense, thank you for explaining.
In that case, lgtm.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Recognize special arithmetic shift in the verifier
  2026-01-06 17:21   ` Andrii Nakryiko
@ 2026-01-08 18:28     ` Puranjay Mohan
  2026-01-08 18:45       ` Eduard Zingerman
  0 siblings, 1 reply; 13+ messages in thread
From: Puranjay Mohan @ 2026-01-08 18:28 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Mykyta Yatsenko, kernel-team, sunhao.th

On Tue, Jan 6, 2026 at 5:22 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Jan 2, 2026 at 6:24 PM Puranjay Mohan <puranjay@kernel.org> wrote:
> >
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > cilium bpf_wiregard.bpf.c when compiled with -O1 fails to load
> > with the following verifier log:
> >
> > 192: (79) r2 = *(u64 *)(r10 -304)     ; R2=pkt(r=40) R10=fp0 fp-304=pkt(r=40)
> > ...
> > 227: (85) call bpf_skb_store_bytes#9          ; R0=scalar()
> > 228: (bc) w2 = w0                     ; R0=scalar() R2=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> > 229: (c4) w2 s>>= 31                  ; R2=scalar(smin=0,smax=umax=0xffffffff,smin32=-1,smax32=0,var_off=(0x0; 0xffffffff))
> > 230: (54) w2 &= -134                  ; R2=scalar(smin=0,smax=umax=umax32=0xffffff7a,smax32=0x7fffff7a,var_off=(0x0; 0xffffff7a))
> > ...
> > 232: (66) if w2 s> 0xffffffff goto pc+125     ; R2=scalar(smin=umin=umin32=0x80000000,smax=umax=umax32=0xffffff7a,smax32=-134,var_off=(0x80000000; 0x7fffff7a))
> > ...
> > 238: (79) r4 = *(u64 *)(r10 -304)     ; R4=scalar() R10=fp0 fp-304=scalar()
> > 239: (56) if w2 != 0xffffff78 goto pc+210     ; R2=0xffffff78 // -136
> > ...
> > 258: (71) r1 = *(u8 *)(r4 +0)
> > R4 invalid mem access 'scalar'
> >
> > The error might confuse most bpf authors, since fp-304 slot had 'pkt'
> > pointer at insn 192 and became 'scalar' at 238. That happened because
> > bpf_skb_store_bytes() clears all packet pointers including those in
> > the stack. On the first glance it might look like a bug in the source
> > code, since ctx->data pointer should have been reloaded after the call
> > to bpf_skb_store_bytes().
> >
> > The relevant part of cilium source code looks like this:
> >
> > // bpf/lib/nodeport.h
> > int dsr_set_ipip6()
> > {
> >         if (ctx_adjust_hroom(...))
> >                 return DROP_INVALID; // -134
> >         if (ctx_store_bytes(...))
> >                 return DROP_WRITE_ERROR; // -141
> >         return 0;
> > }
> >
> > bool dsr_fail_needs_reply(int code)
> > {
> >         if (code == DROP_FRAG_NEEDED) // -136
> >                 return true;
> >         return false;
> > }
> >
> > tail_nodeport_ipv6_dsr()
> > {
> >         ret = dsr_set_ipip6(...);
> >         if (!IS_ERR(ret)) {
> >                 ...
> >         } else {
> >                 if (dsr_fail_needs_reply(ret))
> >                         return dsr_reply_icmp6(...);
> >         }
> > }
> >
> > The code doesn't have arithmetic shift by 31 and it reloads ctx->data
> > every time it needs to access it. So it's not a bug in the source code.
> >
> > The reason is DAGCombiner::foldSelectCCToShiftAnd() LLVM transformation:
> >
> >   // If this is a select where the false operand is zero and the compare is a
> >   // check of the sign bit, see if we can perform the "gzip trick":
> >   // select_cc setlt X, 0, A, 0 -> and (sra X, size(X)-1), A
> >   // select_cc setgt X, 0, A, 0 -> and (not (sra X, size(X)-1)), A
> >
> > The conditional branch in dsr_set_ipip6() and its return values
> > are optimized into BPF_ARSH plus BPF_AND:
> >
> > 227: (85) call bpf_skb_store_bytes#9
> > 228: (bc) w2 = w0
> > 229: (c4) w2 s>>= 31   ; R2=scalar(smin=0,smax=umax=0xffffffff,smin32=-1,smax32=0,var_off=(0x0; 0xffffffff))
> > 230: (54) w2 &= -134   ; R2=scalar(smin=0,smax=umax=umax32=0xffffff7a,smax32=0x7fffff7a,var_off=(0x0; 0xffffff7a))
> >
> > after insn 230 the register w2 can only be 0 or -134,
> > but the verifier approximates it, since there is no way to
> > represent two scalars in bpf_reg_state.
> > After fallthough at insn 232 the w2 can only be -134,
> > hence the branch at insn
> > 239: (56) if w2 != -136 goto pc+210
> > should be always taken, and trapping insn 258 should never execute.
> > LLVM generated correct code, but the verifier follows impossible
> > path and rejects valid program. To fix this issue recognize this
> > special LLVM optimization and fork the verifier state.
> > So after insn 229: (c4) w2 s>>= 31
> > the verifier has two states to explore:
> > one with w2 = 0 and another with w2 = 0xffffffff
> > which makes the verifier accept bpf_wiregard.c
> >
> > Note there are 20+ such patterns in bpf_wiregard.o compiled
> > with -O1 and -O2, but they're rarely seen in other production
> > bpf programs, so push_stack() approach is not a concern.
> >
> > Reported-by: Hao Sun <sunhao.th@gmail.com>
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> > Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> > ---
> >  kernel/bpf/verifier.c | 34 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 34 insertions(+)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index c9da70dd3e72..6dbcfae5615b 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -15490,6 +15490,35 @@ static bool is_safe_to_compute_dst_reg_range(struct bpf_insn *insn,
> >         }
> >  }
> >
> > +static int maybe_fork_scalars(struct bpf_verifier_env *env, struct bpf_insn *insn,
> > +                             struct bpf_reg_state *dst_reg)
> > +{
> > +       struct bpf_verifier_state *branch;
> > +       struct bpf_reg_state *regs;
> > +       bool alu32;
> > +
> > +       if (dst_reg->smin_value == -1 && dst_reg->smax_value == 0)
> > +               alu32 = false;
> > +       else if (dst_reg->s32_min_value == -1 && dst_reg->s32_max_value == 0)
> > +               alu32 = true;
> > +       else
> > +               return 0;
> > +
> > +       branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false);
> > +       if (IS_ERR(branch))
> > +               return PTR_ERR(branch);
> > +
> > +       regs = branch->frame[branch->curframe]->regs;
> > +       if (alu32) {
> > +               __mark_reg32_known(&regs[insn->dst_reg], 0);
> > +               __mark_reg32_known(dst_reg, -1ull);
>
> Did you get a chance to run veristat with these changes against
> selftests, scx, maybe also Meta BPF objects? Curious if there are any
> noticeable differences.
>
> Also, the choice of which branch gets zero vs one set might have
> meaningful differences (heuristically, "linear path" should preferably
> explore conditions that lead to more complete and complex states), so
> I'd be interested to see if and what difference does it make in
> practice.


I have data using selftests, meta, and sched-ext bpf objects. I
compared baselined (bpf-next/master), previous version of this patch
(fork after arsh), and this patch (fork before and), here are the
results:

I used this command: ./veristat -C -e file,prog,states,insns -f
"insns_pct>1" before.csv after.csv

I did not see any change from baseline -> fork_after_arsh ->
fork_before_and for all selftests and meta bpf programs.

But for one sched_ext program I saw this:

../../veristat/src/veristat -C -e file,prog,states,insns -f
"insns_pct>1" sched_baseline.csv sched_fork_after_arsh.csv
File          Program              States (A)  States (B)  States
(DIFF)  Insns (A)  Insns (B)  Insns (DIFF)
------------  -------------------  ----------  ----------
-------------  ---------  ---------  ------------
scxtop.bpf.o  schedule_stop_trace           1           1    +0
(+0.00%)         14         16  +2 (+14.29%)


../../veristat/src/veristat -C -e file,prog,states,insns -f
"insns_pct>1" sched_fork_after_arsh.csv sched_fork_before_and.csv
File          Program              States (A)  States (B)  States
(DIFF)  Insns (A)  Insns (B)  Insns (DIFF)
------------  -------------------  ----------  ----------
-------------  ---------  ---------  ------------
scxtop.bpf.o  schedule_stop_trace           1           1    +0
(+0.00%)         16         15   -1 (-6.25%)


So, I think we should go with the fork before and version (this patch)
because the sequence we want to detect is arsh -> and and therefore
forking after arsh not optimal because every such arsh will not be
followed by an AND operation with a constant.

This is what you see when you compare this version (fork before and)
and previous (fork after arsh) on the selftests added in this set:

../../veristat/src/veristat -C -e file,prog,states,insns -f
"insns_pct>1" fork_after_arsh fork_before_and
File                   Program  States (A)  States (B)  States (DIFF)
Insns (A)  Insns (B)  Insns (DIFF)
---------------------  -------  ----------  ----------  -------------
---------  ---------  ------------
verifier_subreg.bpf.o  arsh_31           1           1    +0 (+0.00%)
       12         11   -1 (-8.33%)
verifier_subreg.bpf.o  arsh_63           1           1    +0 (+0.00%)
       12         11   -1 (-8.33%)

Thanks,
Puranjy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Recognize special arithmetic shift in the verifier
  2026-01-08 18:28     ` Puranjay Mohan
@ 2026-01-08 18:45       ` Eduard Zingerman
  2026-01-09  1:18         ` Alexei Starovoitov
  0 siblings, 1 reply; 13+ messages in thread
From: Eduard Zingerman @ 2026-01-08 18:45 UTC (permalink / raw)
  To: Puranjay Mohan, Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	kernel-team, sunhao.th

On Thu, 2026-01-08 at 18:28 +0000, Puranjay Mohan wrote:

[...]

> This is what you see when you compare this version (fork before and)
> and previous (fork after arsh) on the selftests added in this set:
> 
> ../../veristat/src/veristat -C -e file,prog,states,insns -f
> "insns_pct>1" fork_after_arsh fork_before_and
> File                   Program  States (A)  States (B)  States (DIFF)
> Insns (A)  Insns (B)  Insns (DIFF)
> ---------------------  -------  ----------  ----------  -------------
> ---------  ---------  ------------
> verifier_subreg.bpf.o  arsh_31           1           1    +0 (+0.00%)
>        12         11   -1 (-8.33%)
> verifier_subreg.bpf.o  arsh_63           1           1    +0 (+0.00%)
>        12         11   -1 (-8.33%)

Given that difference is very small, I'd prefer forking after arsh.
Could you please take a cursory look at DAGCombiner implementation and
try to check if there are other patterns that use arsh or arsh+and is
the only one?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Recognize special arithmetic shift in the verifier
  2026-01-08 18:45       ` Eduard Zingerman
@ 2026-01-09  1:18         ` Alexei Starovoitov
  2026-01-09  2:07           ` Eduard Zingerman
  0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2026-01-09  1:18 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Puranjay Mohan, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Kernel Team, Hao Sun

On Thu, Jan 8, 2026 at 10:45 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2026-01-08 at 18:28 +0000, Puranjay Mohan wrote:
>
> [...]
>
> > This is what you see when you compare this version (fork before and)
> > and previous (fork after arsh) on the selftests added in this set:
> >
> > ../../veristat/src/veristat -C -e file,prog,states,insns -f
> > "insns_pct>1" fork_after_arsh fork_before_and
> > File                   Program  States (A)  States (B)  States (DIFF)
> > Insns (A)  Insns (B)  Insns (DIFF)
> > ---------------------  -------  ----------  ----------  -------------
> > ---------  ---------  ------------
> > verifier_subreg.bpf.o  arsh_31           1           1    +0 (+0.00%)
> >        12         11   -1 (-8.33%)
> > verifier_subreg.bpf.o  arsh_63           1           1    +0 (+0.00%)
> >        12         11   -1 (-8.33%)
>
> Given that difference is very small, I'd prefer forking after arsh.

why?
I thought last time we discussed the conclusion was to fork it
before AND, since at the time of ARSH the range is still properly represented,
so reason to take chances and do it early.

> Could you please take a cursory look at DAGCombiner implementation and
> try to check if there are other patterns that use arsh or arsh+and is
> the only one?

Well, it's in commit log:

  // select_cc setlt X, 0, A, 0 -> and (sra X, size(X)-1), A
  // select_cc setgt X, 0, A, 0 -> and (not (sra X, size(X)-1)), A

I suspect 2nd case should work with 'before AND' approach too,
since 'not' should be compiled into XOR which suppose to [-1,0] ->
into the same [-1, 0]
But better to double check, of course.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Recognize special arithmetic shift in the verifier
  2026-01-09  1:18         ` Alexei Starovoitov
@ 2026-01-09  2:07           ` Eduard Zingerman
  2026-01-09  2:53             ` Alexei Starovoitov
  0 siblings, 1 reply; 13+ messages in thread
From: Eduard Zingerman @ 2026-01-09  2:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Puranjay Mohan, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Kernel Team, Hao Sun

On Thu, 2026-01-08 at 17:18 -0800, Alexei Starovoitov wrote:
> On Thu, Jan 8, 2026 at 10:45 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Thu, 2026-01-08 at 18:28 +0000, Puranjay Mohan wrote:
> >
> > [...]
> >
> > > This is what you see when you compare this version (fork before and)
> > > and previous (fork after arsh) on the selftests added in this set:
> > >
> > > ../../veristat/src/veristat -C -e file,prog,states,insns -f
> > > "insns_pct>1" fork_after_arsh fork_before_and
> > > File                   Program  States (A)  States (B)  States (DIFF)
> > > Insns (A)  Insns (B)  Insns (DIFF)
> > > ---------------------  -------  ----------  ----------  -------------
> > > ---------  ---------  ------------
> > > verifier_subreg.bpf.o  arsh_31           1           1    +0 (+0.00%)
> > >        12         11   -1 (-8.33%)
> > > verifier_subreg.bpf.o  arsh_63           1           1    +0 (+0.00%)
> > >        12         11   -1 (-8.33%)
> >
> > Given that difference is very small, I'd prefer forking after arsh.
>
> why?
> I thought last time we discussed the conclusion was to fork it
> before AND, since at the time of ARSH the range is still properly represented,
> so reason to take chances and do it early.
>
> > Could you please take a cursory look at DAGCombiner implementation and
> > try to check if there are other patterns that use arsh or arsh+and is
> > the only one?
>
> Well, it's in commit log:
>
>   // select_cc setlt X, 0, A, 0 -> and (sra X, size(X)-1), A
>   // select_cc setgt X, 0, A, 0 -> and (not (sra X, size(X)-1)), A
>
> I suspect 2nd case should work with 'before AND' approach too,
> since 'not' should be compiled into XOR which suppose to [-1,0] ->
> into the same [-1, 0]
> But better to double check, of course.

Here another example from DAGCombiner.cpp:

  i32 X > -1 ? C1 : -1 --> (X >>s 31) | C1

The same trick, X>>s31 is in range [-1,0].
But we can fork on OR as well, of-course.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Recognize special arithmetic shift in the verifier
  2026-01-09  2:07           ` Eduard Zingerman
@ 2026-01-09  2:53             ` Alexei Starovoitov
  2026-01-12 19:58               ` Puranjay Mohan
  0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2026-01-09  2:53 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Puranjay Mohan, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Kernel Team, Hao Sun

On Thu, Jan 8, 2026 at 6:07 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2026-01-08 at 17:18 -0800, Alexei Starovoitov wrote:
> > On Thu, Jan 8, 2026 at 10:45 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Thu, 2026-01-08 at 18:28 +0000, Puranjay Mohan wrote:
> > >
> > > [...]
> > >
> > > > This is what you see when you compare this version (fork before and)
> > > > and previous (fork after arsh) on the selftests added in this set:
> > > >
> > > > ../../veristat/src/veristat -C -e file,prog,states,insns -f
> > > > "insns_pct>1" fork_after_arsh fork_before_and
> > > > File                   Program  States (A)  States (B)  States (DIFF)
> > > > Insns (A)  Insns (B)  Insns (DIFF)
> > > > ---------------------  -------  ----------  ----------  -------------
> > > > ---------  ---------  ------------
> > > > verifier_subreg.bpf.o  arsh_31           1           1    +0 (+0.00%)
> > > >        12         11   -1 (-8.33%)
> > > > verifier_subreg.bpf.o  arsh_63           1           1    +0 (+0.00%)
> > > >        12         11   -1 (-8.33%)
> > >
> > > Given that difference is very small, I'd prefer forking after arsh.
> >
> > why?
> > I thought last time we discussed the conclusion was to fork it
> > before AND, since at the time of ARSH the range is still properly represented,
> > so reason to take chances and do it early.
> >
> > > Could you please take a cursory look at DAGCombiner implementation and
> > > try to check if there are other patterns that use arsh or arsh+and is
> > > the only one?
> >
> > Well, it's in commit log:
> >
> >   // select_cc setlt X, 0, A, 0 -> and (sra X, size(X)-1), A
> >   // select_cc setgt X, 0, A, 0 -> and (not (sra X, size(X)-1)), A
> >
> > I suspect 2nd case should work with 'before AND' approach too,
> > since 'not' should be compiled into XOR which suppose to [-1,0] ->
> > into the same [-1, 0]
> > But better to double check, of course.
>
> Here another example from DAGCombiner.cpp:
>
>   i32 X > -1 ? C1 : -1 --> (X >>s 31) | C1
>
> The same trick, X>>s31 is in range [-1,0].
> But we can fork on OR as well, of-course.

Good catch. This one can also be done "before OR".
I doubt there will be more combinations that make
"after ARSH" argument stronger.
I guess I don't mind either option. Let's pick.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Recognize special arithmetic shift in the verifier
  2026-01-09  2:53             ` Alexei Starovoitov
@ 2026-01-12 19:58               ` Puranjay Mohan
  0 siblings, 0 replies; 13+ messages in thread
From: Puranjay Mohan @ 2026-01-12 19:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Kernel Team, Hao Sun

On Fri, Jan 9, 2026 at 3:54 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jan 8, 2026 at 6:07 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Thu, 2026-01-08 at 17:18 -0800, Alexei Starovoitov wrote:
> > > On Thu, Jan 8, 2026 at 10:45 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > >
> > > > On Thu, 2026-01-08 at 18:28 +0000, Puranjay Mohan wrote:
> > > >
> > > > [...]
> > > >
> > > > > This is what you see when you compare this version (fork before and)
> > > > > and previous (fork after arsh) on the selftests added in this set:
> > > > >
> > > > > ../../veristat/src/veristat -C -e file,prog,states,insns -f
> > > > > "insns_pct>1" fork_after_arsh fork_before_and
> > > > > File                   Program  States (A)  States (B)  States (DIFF)
> > > > > Insns (A)  Insns (B)  Insns (DIFF)
> > > > > ---------------------  -------  ----------  ----------  -------------
> > > > > ---------  ---------  ------------
> > > > > verifier_subreg.bpf.o  arsh_31           1           1    +0 (+0.00%)
> > > > >        12         11   -1 (-8.33%)
> > > > > verifier_subreg.bpf.o  arsh_63           1           1    +0 (+0.00%)
> > > > >        12         11   -1 (-8.33%)
> > > >
> > > > Given that difference is very small, I'd prefer forking after arsh.
> > >
> > > why?
> > > I thought last time we discussed the conclusion was to fork it
> > > before AND, since at the time of ARSH the range is still properly represented,
> > > so reason to take chances and do it early.
> > >
> > > > Could you please take a cursory look at DAGCombiner implementation and
> > > > try to check if there are other patterns that use arsh or arsh+and is
> > > > the only one?
> > >
> > > Well, it's in commit log:
> > >
> > >   // select_cc setlt X, 0, A, 0 -> and (sra X, size(X)-1), A
> > >   // select_cc setgt X, 0, A, 0 -> and (not (sra X, size(X)-1)), A
> > >
> > > I suspect 2nd case should work with 'before AND' approach too,
> > > since 'not' should be compiled into XOR which suppose to [-1,0] ->
> > > into the same [-1, 0]
> > > But better to double check, of course.
> >
> > Here another example from DAGCombiner.cpp:
> >
> >   i32 X > -1 ? C1 : -1 --> (X >>s 31) | C1
> >
> > The same trick, X>>s31 is in range [-1,0].
> > But we can fork on OR as well, of-course.
>
> Good catch. This one can also be done "before OR".
> I doubt there will be more combinations that make
> "after ARSH" argument stronger.
> I guess I don't mind either option. Let's pick.

So, Eduard and I discussed off-list to go with before and / before or,
so I will post the next version with that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-01-12 19:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-03  2:23 [PATCH bpf-next v3 0/2] bpf: Recognize special arithmetic shift in the verifier Puranjay Mohan
2026-01-03  2:23 ` [PATCH bpf-next v3 1/2] " Puranjay Mohan
2026-01-06 17:21   ` Andrii Nakryiko
2026-01-08 18:28     ` Puranjay Mohan
2026-01-08 18:45       ` Eduard Zingerman
2026-01-09  1:18         ` Alexei Starovoitov
2026-01-09  2:07           ` Eduard Zingerman
2026-01-09  2:53             ` Alexei Starovoitov
2026-01-12 19:58               ` Puranjay Mohan
2026-01-08  1:03   ` Eduard Zingerman
2026-01-08  1:07     ` Alexei Starovoitov
2026-01-08  1:19       ` Eduard Zingerman
2026-01-03  2:23 ` [PATCH bpf-next v3 2/2] selftests/bpf: Add tests for s>>=31 and s>>=63 Puranjay Mohan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox