* [PATCH bpf-next v4 1/2] bpf: Get better reg range with ldsx and 32bit compare
@ 2024-07-18 5:28 Yonghong Song
2024-07-18 5:28 ` [PATCH bpf-next v4 2/2] selftests/bpf: Add reg_bounds tests for ldsx and subreg compare Yonghong Song
2024-07-19 22:46 ` [PATCH bpf-next v4 1/2] bpf: Get better reg range with ldsx and 32bit compare Andrii Nakryiko
0 siblings, 2 replies; 9+ messages in thread
From: Yonghong Song @ 2024-07-18 5:28 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
Martin KaFai Lau, Eduard Zingerman, Shung-Hsi Yu
With latest llvm19, the selftest iters/iter_arr_with_actual_elem_count
failed with -mcpu=v4.
The following are the details:
0: R1=ctx() R10=fp0
; int iter_arr_with_actual_elem_count(const void *ctx) @ iters.c:1420
0: (b4) w7 = 0 ; R7_w=0
; int i, n = loop_data.n, sum = 0; @ iters.c:1422
1: (18) r1 = 0xffffc90000191478 ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144)
3: (61) r6 = *(u32 *)(r1 +128) ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
; if (n > ARRAY_SIZE(loop_data.data)) @ iters.c:1424
4: (26) if w6 > 0x20 goto pc+27 ; R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
5: (bf) r8 = r10 ; R8_w=fp0 R10=fp0
6: (07) r8 += -8 ; R8_w=fp-8
; bpf_for(i, 0, n) { @ iters.c:1427
7: (bf) r1 = r8 ; R1_w=fp-8 R8_w=fp-8
8: (b4) w2 = 0 ; R2_w=0
9: (bc) w3 = w6 ; R3_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R6_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
10: (85) call bpf_iter_num_new#45179 ; R0=scalar() fp-8=iter_num(ref_id=2,state=active,depth=0) refs=2
11: (bf) r1 = r8 ; R1=fp-8 R8=fp-8 refs=2
12: (85) call bpf_iter_num_next#45181 13: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R6=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
; bpf_for(i, 0, n) { @ iters.c:1427
13: (15) if r0 == 0x0 goto pc+2 ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) refs=2
14: (81) r1 = *(s32 *)(r0 +0) ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) refs=2
15: (ae) if w1 < w6 goto pc+4 20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=0xffffffff80000000,smax=smax32=umax32=31,umax=0xffffffff0000001f,smin32=0,var_off=(0x0; 0xffffffff0000001f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2
; sum += loop_data.data[i]; @ iters.c:1429
20: (67) r1 <<= 2 ; R1_w=scalar(smax=0x7ffffffc0000007c,umax=0xfffffffc0000007c,smin32=0,smax32=umax32=124,var_off=(0x0; 0xfffffffc0000007c)) refs=2
21: (18) r2 = 0xffffc90000191478 ; R2_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) refs=2
23: (0f) r2 += r1
math between map_value pointer and register with unbounded min value is not allowed
The source code:
int iter_arr_with_actual_elem_count(const void *ctx)
{
int i, n = loop_data.n, sum = 0;
if (n > ARRAY_SIZE(loop_data.data))
return 0;
bpf_for(i, 0, n) {
/* no rechecking of i against ARRAY_SIZE(loop_data.n) */
sum += loop_data.data[i];
}
return sum;
}
The insn #14 is a sign-extenstion load which is related to 'int i'.
The insn #15 did a subreg comparision. Note that smin=0xffffffff80000000 and this caused later
insn #23 failed verification due to unbounded min value.
Actually insn #15 R1 smin range can be better. Before insn #15, we have
R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff)
With the above range, we know for R1, upper 32bit can only be 0xffffffff or 0.
Otherwise, the value range for R1 could be beyond [smin=0xffffffff80000000,smax=0x7fffffff].
After insn #15, for the true patch, we know smin32=0 and smax32=32. With the upper 32bit 0xffffffff,
then the corresponding value is [0xffffffff00000000, 0xffffffff00000020]. The range is
obviously beyond the original range [smin=0xffffffff80000000,smax=0x7fffffff] and the
range is not possible. So the upper 32bit must be 0, which implies smin = smin32 and
smax = smax32.
This patch fixed the issue by adding additional register deduction after 32-bit compare
insn. If the signed 32-bit register range is non-negative then 64-bit smin is
in range of [S32_MIN, S32_MAX], then the actual 64-bit smin/smax should be the same
as 32-bit smin32/smax32.
With this patch, iters/iter_arr_with_actual_elem_count succeeded with better register range:
from 15 to 20: R0=rdonly_mem(id=7,ref_obj_id=2,sz=4) R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=31,var_off=(0x0; 0x1f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=scalar(id=9,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R8=scalar(id=9,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=3) refs=2
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
kernel/bpf/verifier.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8da132a1ef28..46532437c4bb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2182,6 +2182,42 @@ static void __reg_deduce_mixed_bounds(struct bpf_reg_state *reg)
reg->smin_value = max_t(s64, reg->smin_value, new_smin);
reg->smax_value = min_t(s64, reg->smax_value, new_smax);
}
+
+ /* Here we would like to handle a special case after sign extending load,
+ * when upper bits for a 64-bit range are all 1s or all 0s.
+ *
+ * Upper bits are all 1s when register is in a range:
+ * [0xffff_ffff_0000_0000, 0xffff_ffff_ffff_ffff]
+ * Upper bits are all 0s when register is in a range:
+ * [0x0000_0000_0000_0000, 0x0000_0000_ffff_ffff]
+ * Together this forms are continuous range:
+ * [0xffff_ffff_0000_0000, 0x0000_0000_ffff_ffff]
+ *
+ * Now, suppose that register range is in fact tighter:
+ * [0xffff_ffff_8000_0000, 0x0000_0000_ffff_ffff] (R)
+ * Also suppose that it's 32-bit range is positive,
+ * meaning that lower 32-bits of the full 64-bit register
+ * are in the range:
+ * [0x0000_0000, 0x7fff_ffff] (W)
+ *
+ * If this happens, then any value in a range:
+ * [0xffff_ffff_0000_0000, 0xffff_ffff_7fff_ffff]
+ * is smaller than a lowest bound of the range (R):
+ * 0xffff_ffff_8000_0000
+ * which means that upper bits of the full 64-bit register
+ * can't be all 1s, when lower bits are in range (W).
+ *
+ * Note that:
+ * - 0xffff_ffff_8000_0000 == (s64)S32_MIN
+ * - 0x0000_0000_ffff_ffff == (s64)S32_MAX
+ * These relations are used in the conditions below.
+ */
+ if (reg->s32_min_value >= 0 && reg->smin_value >= S32_MIN && reg->smax_value <= S32_MAX) {
+ reg->smin_value = reg->umin_value = reg->s32_min_value;
+ reg->smax_value = reg->umax_value = reg->s32_max_value;
+ reg->var_off = tnum_intersect(reg->var_off,
+ tnum_range(reg->smin_value, reg->smax_value));
+ }
}
static void __reg_deduce_bounds(struct bpf_reg_state *reg)
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH bpf-next v4 2/2] selftests/bpf: Add reg_bounds tests for ldsx and subreg compare 2024-07-18 5:28 [PATCH bpf-next v4 1/2] bpf: Get better reg range with ldsx and 32bit compare Yonghong Song @ 2024-07-18 5:28 ` Yonghong Song 2024-07-18 20:48 ` Eduard Zingerman 2024-07-19 22:58 ` Andrii Nakryiko 2024-07-19 22:46 ` [PATCH bpf-next v4 1/2] bpf: Get better reg range with ldsx and 32bit compare Andrii Nakryiko 1 sibling, 2 replies; 9+ messages in thread From: Yonghong Song @ 2024-07-18 5:28 UTC (permalink / raw) To: bpf Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team, Martin KaFai Lau Add a few reg_bounds selftests to test 32/16/8-bit ldsx and subreg comparison. Without the previous patch, all added tests will fail. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> --- .../selftests/bpf/prog_tests/reg_bounds.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/reg_bounds.c b/tools/testing/selftests/bpf/prog_tests/reg_bounds.c index eb74363f9f70..cd9bafe9c057 100644 --- a/tools/testing/selftests/bpf/prog_tests/reg_bounds.c +++ b/tools/testing/selftests/bpf/prog_tests/reg_bounds.c @@ -441,6 +441,20 @@ static struct range range_refine(enum num_t x_t, struct range x, enum num_t y_t, if (t_is_32(y_t) && !t_is_32(x_t)) { struct range x_swap; + /* If we know that + * - *x* is in the range of signed 32bit value + * - *y_cast* range is 32-bit sign non-negative, and + * then *x* range can be narrowed to the interaction of + * *x* and *y_cast*. Otherwise, if the new range for *x* + * allows upper 32-bit 0xffffffff then the eventual new + * range for *x* will be out of signed 32-bit range + * which violates the origin *x* range. + */ + if (x_t == S64 && y_t == S32 && + !(y_cast.a & 0xffffffff80000000ULL) && !(y_cast.b & 0xffffffff80000000) && + (long long)x.a >= S32_MIN && (long long)x.b <= S32_MAX) + return range_improve(x_t, x, y_cast); + /* some combinations of upper 32 bits and sign bit can lead to * invalid ranges, in such cases it's easier to detect them * after cast/swap than try to enumerate all the conditions @@ -2108,6 +2122,9 @@ static struct subtest_case crafted_cases[] = { {S32, U32, {(u32)S32_MIN, 0}, {0, 0}}, {S32, U32, {(u32)S32_MIN, 0}, {(u32)S32_MIN, (u32)S32_MIN}}, {S32, U32, {(u32)S32_MIN, S32_MAX}, {S32_MAX, S32_MAX}}, + {S64, U32, {0x0, 0x1f}, {0xffffffff80000000ULL, 0x000000007fffffffULL}}, + {S64, U32, {0x0, 0x1f}, {0xffffffffffff8000ULL, 0x0000000000007fffULL}}, + {S64, U32, {0x0, 0x1f}, {0xffffffffffffff80ULL, 0x000000000000007fULL}}, }; /* Go over crafted hard-coded cases. This is fast, so we do it as part of -- 2.43.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v4 2/2] selftests/bpf: Add reg_bounds tests for ldsx and subreg compare 2024-07-18 5:28 ` [PATCH bpf-next v4 2/2] selftests/bpf: Add reg_bounds tests for ldsx and subreg compare Yonghong Song @ 2024-07-18 20:48 ` Eduard Zingerman 2024-07-19 22:58 ` Andrii Nakryiko 1 sibling, 0 replies; 9+ messages in thread From: Eduard Zingerman @ 2024-07-18 20:48 UTC (permalink / raw) To: Yonghong Song, bpf Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team, Martin KaFai Lau On Wed, 2024-07-17 at 22:28 -0700, Yonghong Song wrote: > Add a few reg_bounds selftests to test 32/16/8-bit ldsx and subreg comparison. > Without the previous patch, all added tests will fail. > > Signed-off-by: Yonghong Song <yonghong.song@linux.dev> > --- Acked-by: Eduard Zingerman <eddyz87@gmail.com> [...] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v4 2/2] selftests/bpf: Add reg_bounds tests for ldsx and subreg compare 2024-07-18 5:28 ` [PATCH bpf-next v4 2/2] selftests/bpf: Add reg_bounds tests for ldsx and subreg compare Yonghong Song 2024-07-18 20:48 ` Eduard Zingerman @ 2024-07-19 22:58 ` Andrii Nakryiko 2024-07-22 18:11 ` Yonghong Song 1 sibling, 1 reply; 9+ messages in thread From: Andrii Nakryiko @ 2024-07-19 22:58 UTC (permalink / raw) To: Yonghong Song Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team, Martin KaFai Lau On Wed, Jul 17, 2024 at 10:28 PM Yonghong Song <yonghong.song@linux.dev> wrote: > > Add a few reg_bounds selftests to test 32/16/8-bit ldsx and subreg comparison. > Without the previous patch, all added tests will fail. > > Signed-off-by: Yonghong Song <yonghong.song@linux.dev> > --- > .../selftests/bpf/prog_tests/reg_bounds.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > wow, I already forgot most of the things in here... :( > diff --git a/tools/testing/selftests/bpf/prog_tests/reg_bounds.c b/tools/testing/selftests/bpf/prog_tests/reg_bounds.c > index eb74363f9f70..cd9bafe9c057 100644 > --- a/tools/testing/selftests/bpf/prog_tests/reg_bounds.c > +++ b/tools/testing/selftests/bpf/prog_tests/reg_bounds.c > @@ -441,6 +441,20 @@ static struct range range_refine(enum num_t x_t, struct range x, enum num_t y_t, > if (t_is_32(y_t) && !t_is_32(x_t)) { > struct range x_swap; > > + /* If we know that > + * - *x* is in the range of signed 32bit value > + * - *y_cast* range is 32-bit sign non-negative, and sign -> signed? > + * then *x* range can be narrowed to the interaction of what does it mean "narrowed to the interaction"? > + * *x* and *y_cast*. Otherwise, if the new range for *x* > + * allows upper 32-bit 0xffffffff then the eventual new > + * range for *x* will be out of signed 32-bit range > + * which violates the origin *x* range. > + */ > + if (x_t == S64 && y_t == S32 && tbh, given this is so specific for x_t == S64 and y_T == S32, I'd move it out from this if into an independent condition, it doesn't benefit from being inside > + !(y_cast.a & 0xffffffff80000000ULL) && !(y_cast.b & 0xffffffff80000000) && isn't this just a much more convoluted way of checking: y_cast.a <= 0x7fffffffULL && y_cast.b <= 0x7fffffffULL ? Is & + negation really easier to follow?... > + (long long)x.a >= S32_MIN && (long long)x.b <= S32_MAX) > + return range_improve(x_t, x, y_cast); > + > /* some combinations of upper 32 bits and sign bit can lead to > * invalid ranges, in such cases it's easier to detect them > * after cast/swap than try to enumerate all the conditions > @@ -2108,6 +2122,9 @@ static struct subtest_case crafted_cases[] = { > {S32, U32, {(u32)S32_MIN, 0}, {0, 0}}, > {S32, U32, {(u32)S32_MIN, 0}, {(u32)S32_MIN, (u32)S32_MIN}}, > {S32, U32, {(u32)S32_MIN, S32_MAX}, {S32_MAX, S32_MAX}}, > + {S64, U32, {0x0, 0x1f}, {0xffffffff80000000ULL, 0x000000007fffffffULL}}, > + {S64, U32, {0x0, 0x1f}, {0xffffffffffff8000ULL, 0x0000000000007fffULL}}, > + {S64, U32, {0x0, 0x1f}, {0xffffffffffffff80ULL, 0x000000000000007fULL}}, > }; > > /* Go over crafted hard-coded cases. This is fast, so we do it as part of > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v4 2/2] selftests/bpf: Add reg_bounds tests for ldsx and subreg compare 2024-07-19 22:58 ` Andrii Nakryiko @ 2024-07-22 18:11 ` Yonghong Song 0 siblings, 0 replies; 9+ messages in thread From: Yonghong Song @ 2024-07-22 18:11 UTC (permalink / raw) To: Andrii Nakryiko Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team, Martin KaFai Lau On 7/19/24 3:58 PM, Andrii Nakryiko wrote: > On Wed, Jul 17, 2024 at 10:28 PM Yonghong Song <yonghong.song@linux.dev> wrote: >> Add a few reg_bounds selftests to test 32/16/8-bit ldsx and subreg comparison. >> Without the previous patch, all added tests will fail. >> >> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> >> --- >> .../selftests/bpf/prog_tests/reg_bounds.c | 17 +++++++++++++++++ >> 1 file changed, 17 insertions(+) >> > wow, I already forgot most of the things in here... :( > >> diff --git a/tools/testing/selftests/bpf/prog_tests/reg_bounds.c b/tools/testing/selftests/bpf/prog_tests/reg_bounds.c >> index eb74363f9f70..cd9bafe9c057 100644 >> --- a/tools/testing/selftests/bpf/prog_tests/reg_bounds.c >> +++ b/tools/testing/selftests/bpf/prog_tests/reg_bounds.c >> @@ -441,6 +441,20 @@ static struct range range_refine(enum num_t x_t, struct range x, enum num_t y_t, >> if (t_is_32(y_t) && !t_is_32(x_t)) { >> struct range x_swap; >> >> + /* If we know that >> + * - *x* is in the range of signed 32bit value >> + * - *y_cast* range is 32-bit sign non-negative, and > sign -> signed? Ack > >> + * then *x* range can be narrowed to the interaction of > what does it mean "narrowed to the interaction"? Let us change to '*x* range can be improved with *y_cast*. > >> + * *x* and *y_cast*. Otherwise, if the new range for *x* >> + * allows upper 32-bit 0xffffffff then the eventual new >> + * range for *x* will be out of signed 32-bit range >> + * which violates the origin *x* range. >> + */ >> + if (x_t == S64 && y_t == S32 && > tbh, given this is so specific for x_t == S64 and y_T == S32, I'd move > it out from this if into an independent condition, it doesn't benefit > from being inside Okay, I can do this. > >> + !(y_cast.a & 0xffffffff80000000ULL) && !(y_cast.b & 0xffffffff80000000) && > isn't this just a much more convoluted way of checking: > > y_cast.a <= 0x7fffffffULL && y_cast.b <= 0x7fffffffULL Yes, this is indeed simpler. I can use this one. > > ? Is & + negation really easier to follow?... > >> + (long long)x.a >= S32_MIN && (long long)x.b <= S32_MAX) >> + return range_improve(x_t, x, y_cast); >> + >> /* some combinations of upper 32 bits and sign bit can lead to >> * invalid ranges, in such cases it's easier to detect them >> * after cast/swap than try to enumerate all the conditions >> @@ -2108,6 +2122,9 @@ static struct subtest_case crafted_cases[] = { >> {S32, U32, {(u32)S32_MIN, 0}, {0, 0}}, >> {S32, U32, {(u32)S32_MIN, 0}, {(u32)S32_MIN, (u32)S32_MIN}}, >> {S32, U32, {(u32)S32_MIN, S32_MAX}, {S32_MAX, S32_MAX}}, >> + {S64, U32, {0x0, 0x1f}, {0xffffffff80000000ULL, 0x000000007fffffffULL}}, >> + {S64, U32, {0x0, 0x1f}, {0xffffffffffff8000ULL, 0x0000000000007fffULL}}, >> + {S64, U32, {0x0, 0x1f}, {0xffffffffffffff80ULL, 0x000000000000007fULL}}, >> }; >> >> /* Go over crafted hard-coded cases. This is fast, so we do it as part of >> -- >> 2.43.0 >> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v4 1/2] bpf: Get better reg range with ldsx and 32bit compare 2024-07-18 5:28 [PATCH bpf-next v4 1/2] bpf: Get better reg range with ldsx and 32bit compare Yonghong Song 2024-07-18 5:28 ` [PATCH bpf-next v4 2/2] selftests/bpf: Add reg_bounds tests for ldsx and subreg compare Yonghong Song @ 2024-07-19 22:46 ` Andrii Nakryiko 2024-07-19 23:40 ` Eduard Zingerman 2024-07-22 18:16 ` Yonghong Song 1 sibling, 2 replies; 9+ messages in thread From: Andrii Nakryiko @ 2024-07-19 22:46 UTC (permalink / raw) To: Yonghong Song Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team, Martin KaFai Lau, Eduard Zingerman, Shung-Hsi Yu On Wed, Jul 17, 2024 at 10:28 PM Yonghong Song <yonghong.song@linux.dev> wrote: > > With latest llvm19, the selftest iters/iter_arr_with_actual_elem_count > failed with -mcpu=v4. > > The following are the details: > 0: R1=ctx() R10=fp0 > ; int iter_arr_with_actual_elem_count(const void *ctx) @ iters.c:1420 > 0: (b4) w7 = 0 ; R7_w=0 > ; int i, n = loop_data.n, sum = 0; @ iters.c:1422 > 1: (18) r1 = 0xffffc90000191478 ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) > 3: (61) r6 = *(u32 *)(r1 +128) ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) > ; if (n > ARRAY_SIZE(loop_data.data)) @ iters.c:1424 > 4: (26) if w6 > 0x20 goto pc+27 ; R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) > 5: (bf) r8 = r10 ; R8_w=fp0 R10=fp0 > 6: (07) r8 += -8 ; R8_w=fp-8 > ; bpf_for(i, 0, n) { @ iters.c:1427 > 7: (bf) r1 = r8 ; R1_w=fp-8 R8_w=fp-8 > 8: (b4) w2 = 0 ; R2_w=0 > 9: (bc) w3 = w6 ; R3_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R6_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) > 10: (85) call bpf_iter_num_new#45179 ; R0=scalar() fp-8=iter_num(ref_id=2,state=active,depth=0) refs=2 > 11: (bf) r1 = r8 ; R1=fp-8 R8=fp-8 refs=2 > 12: (85) call bpf_iter_num_next#45181 13: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R6=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2 > ; bpf_for(i, 0, n) { @ iters.c:1427 > 13: (15) if r0 == 0x0 goto pc+2 ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) refs=2 > 14: (81) r1 = *(s32 *)(r0 +0) ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) refs=2 > 15: (ae) if w1 < w6 goto pc+4 20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=0xffffffff80000000,smax=smax32=umax32=31,umax=0xffffffff0000001f,smin32=0,var_off=(0x0; 0xffffffff0000001f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2 > ; sum += loop_data.data[i]; @ iters.c:1429 > 20: (67) r1 <<= 2 ; R1_w=scalar(smax=0x7ffffffc0000007c,umax=0xfffffffc0000007c,smin32=0,smax32=umax32=124,var_off=(0x0; 0xfffffffc0000007c)) refs=2 > 21: (18) r2 = 0xffffc90000191478 ; R2_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) refs=2 > 23: (0f) r2 += r1 > math between map_value pointer and register with unbounded min value is not allowed > > The source code: > int iter_arr_with_actual_elem_count(const void *ctx) > { > int i, n = loop_data.n, sum = 0; > > if (n > ARRAY_SIZE(loop_data.data)) > return 0; > > bpf_for(i, 0, n) { > /* no rechecking of i against ARRAY_SIZE(loop_data.n) */ > sum += loop_data.data[i]; > } > > return sum; > } > > The insn #14 is a sign-extenstion load which is related to 'int i'. > The insn #15 did a subreg comparision. Note that smin=0xffffffff80000000 and this caused later > insn #23 failed verification due to unbounded min value. > > Actually insn #15 R1 smin range can be better. Before insn #15, we have > R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) > With the above range, we know for R1, upper 32bit can only be 0xffffffff or 0. > Otherwise, the value range for R1 could be beyond [smin=0xffffffff80000000,smax=0x7fffffff]. > > After insn #15, for the true patch, we know smin32=0 and smax32=32. With the upper 32bit 0xffffffff, > then the corresponding value is [0xffffffff00000000, 0xffffffff00000020]. The range is > obviously beyond the original range [smin=0xffffffff80000000,smax=0x7fffffff] and the > range is not possible. So the upper 32bit must be 0, which implies smin = smin32 and > smax = smax32. > > This patch fixed the issue by adding additional register deduction after 32-bit compare __reg_deduce_mixed_bounds() is called from reg_bounds_sync() pretty much after every arithmetic operation or any comparison. Is the above logic true universally or only after signed comparison? If the latter, then we can't just do it unconditionally inside __reg_deduce_mixed_bounds(). > insn. If the signed 32-bit register range is non-negative then 64-bit smin is > in range of [S32_MIN, S32_MAX], then the actual 64-bit smin/smax should be the same > as 32-bit smin32/smax32. > > With this patch, iters/iter_arr_with_actual_elem_count succeeded with better register range: > > from 15 to 20: R0=rdonly_mem(id=7,ref_obj_id=2,sz=4) R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=31,var_off=(0x0; 0x1f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=scalar(id=9,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R8=scalar(id=9,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=3) refs=2 > > Acked-by: Eduard Zingerman <eddyz87@gmail.com> > Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> > Signed-off-by: Yonghong Song <yonghong.song@linux.dev> > --- > kernel/bpf/verifier.c | 36 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 36 insertions(+) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 8da132a1ef28..46532437c4bb 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -2182,6 +2182,42 @@ static void __reg_deduce_mixed_bounds(struct bpf_reg_state *reg) > reg->smin_value = max_t(s64, reg->smin_value, new_smin); > reg->smax_value = min_t(s64, reg->smax_value, new_smax); > } > + > + /* Here we would like to handle a special case after sign extending load, > + * when upper bits for a 64-bit range are all 1s or all 0s. > + * > + * Upper bits are all 1s when register is in a range: > + * [0xffff_ffff_0000_0000, 0xffff_ffff_ffff_ffff] > + * Upper bits are all 0s when register is in a range: > + * [0x0000_0000_0000_0000, 0x0000_0000_ffff_ffff] > + * Together this forms are continuous range: > + * [0xffff_ffff_0000_0000, 0x0000_0000_ffff_ffff] > + * > + * Now, suppose that register range is in fact tighter: > + * [0xffff_ffff_8000_0000, 0x0000_0000_ffff_ffff] (R) > + * Also suppose that it's 32-bit range is positive, > + * meaning that lower 32-bits of the full 64-bit register > + * are in the range: > + * [0x0000_0000, 0x7fff_ffff] (W) > + * > + * If this happens, then any value in a range: > + * [0xffff_ffff_0000_0000, 0xffff_ffff_7fff_ffff] > + * is smaller than a lowest bound of the range (R): > + * 0xffff_ffff_8000_0000 > + * which means that upper bits of the full 64-bit register > + * can't be all 1s, when lower bits are in range (W). > + * > + * Note that: > + * - 0xffff_ffff_8000_0000 == (s64)S32_MIN > + * - 0x0000_0000_ffff_ffff == (s64)S32_MAX ?? S32_MAX = 0x7fffffff, so should the right part be U32_MAX or the left part should be 0x0000_0000_7fff_ffff ? > + * These relations are used in the conditions below. > + */ > + if (reg->s32_min_value >= 0 && reg->smin_value >= S32_MIN && reg->smax_value <= S32_MAX) { > + reg->smin_value = reg->umin_value = reg->s32_min_value; > + reg->smax_value = reg->umax_value = reg->s32_max_value; let's please not mix signed and unsigned 32 -> 64 bit conversions, they are confusing and tricky enough in each domain individually, there is no point in mixing them > + reg->var_off = tnum_intersect(reg->var_off, > + tnum_range(reg->smin_value, reg->smax_value)); > + } > } > > static void __reg_deduce_bounds(struct bpf_reg_state *reg) > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v4 1/2] bpf: Get better reg range with ldsx and 32bit compare 2024-07-19 22:46 ` [PATCH bpf-next v4 1/2] bpf: Get better reg range with ldsx and 32bit compare Andrii Nakryiko @ 2024-07-19 23:40 ` Eduard Zingerman 2024-07-22 18:16 ` Yonghong Song 1 sibling, 0 replies; 9+ messages in thread From: Eduard Zingerman @ 2024-07-19 23:40 UTC (permalink / raw) To: Andrii Nakryiko, Yonghong Song Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team, Martin KaFai Lau, Shung-Hsi Yu On Fri, 2024-07-19 at 15:46 -0700, Andrii Nakryiko wrote: [...] > > + /* Here we would like to handle a special case after sign extending load, > > + * when upper bits for a 64-bit range are all 1s or all 0s. > > + * > > + * Upper bits are all 1s when register is in a range: > > + * [0xffff_ffff_0000_0000, 0xffff_ffff_ffff_ffff] > > + * Upper bits are all 0s when register is in a range: > > + * [0x0000_0000_0000_0000, 0x0000_0000_ffff_ffff] > > + * Together this forms are continuous range: > > + * [0xffff_ffff_0000_0000, 0x0000_0000_ffff_ffff] > > + * > > + * Now, suppose that register range is in fact tighter: > > + * [0xffff_ffff_8000_0000, 0x0000_0000_ffff_ffff] (R) > > + * Also suppose that it's 32-bit range is positive, > > + * meaning that lower 32-bits of the full 64-bit register > > + * are in the range: > > + * [0x0000_0000, 0x7fff_ffff] (W) > > + * > > + * If this happens, then any value in a range: > > + * [0xffff_ffff_0000_0000, 0xffff_ffff_7fff_ffff] > > + * is smaller than a lowest bound of the range (R): > > + * 0xffff_ffff_8000_0000 > > + * which means that upper bits of the full 64-bit register > > + * can't be all 1s, when lower bits are in range (W). > > + * > > + * Note that: > > + * - 0xffff_ffff_8000_0000 == (s64)S32_MIN > > + * - 0x0000_0000_ffff_ffff == (s64)S32_MAX > > ?? S32_MAX = 0x7fffffff, so should the right part be U32_MAX or the > left part should be 0x0000_0000_7fff_ffff ? Oops, that's on me, yes it should be 0x0000_0000_7fff_ffff here. [...] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v4 1/2] bpf: Get better reg range with ldsx and 32bit compare 2024-07-19 22:46 ` [PATCH bpf-next v4 1/2] bpf: Get better reg range with ldsx and 32bit compare Andrii Nakryiko 2024-07-19 23:40 ` Eduard Zingerman @ 2024-07-22 18:16 ` Yonghong Song 2024-07-24 22:56 ` Andrii Nakryiko 1 sibling, 1 reply; 9+ messages in thread From: Yonghong Song @ 2024-07-22 18:16 UTC (permalink / raw) To: Andrii Nakryiko Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team, Martin KaFai Lau, Eduard Zingerman, Shung-Hsi Yu On 7/19/24 3:46 PM, Andrii Nakryiko wrote: > On Wed, Jul 17, 2024 at 10:28 PM Yonghong Song <yonghong.song@linux.dev> wrote: >> With latest llvm19, the selftest iters/iter_arr_with_actual_elem_count >> failed with -mcpu=v4. >> >> The following are the details: >> 0: R1=ctx() R10=fp0 >> ; int iter_arr_with_actual_elem_count(const void *ctx) @ iters.c:1420 >> 0: (b4) w7 = 0 ; R7_w=0 >> ; int i, n = loop_data.n, sum = 0; @ iters.c:1422 >> 1: (18) r1 = 0xffffc90000191478 ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) >> 3: (61) r6 = *(u32 *)(r1 +128) ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) >> ; if (n > ARRAY_SIZE(loop_data.data)) @ iters.c:1424 >> 4: (26) if w6 > 0x20 goto pc+27 ; R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) >> 5: (bf) r8 = r10 ; R8_w=fp0 R10=fp0 >> 6: (07) r8 += -8 ; R8_w=fp-8 >> ; bpf_for(i, 0, n) { @ iters.c:1427 >> 7: (bf) r1 = r8 ; R1_w=fp-8 R8_w=fp-8 >> 8: (b4) w2 = 0 ; R2_w=0 >> 9: (bc) w3 = w6 ; R3_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R6_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) >> 10: (85) call bpf_iter_num_new#45179 ; R0=scalar() fp-8=iter_num(ref_id=2,state=active,depth=0) refs=2 >> 11: (bf) r1 = r8 ; R1=fp-8 R8=fp-8 refs=2 >> 12: (85) call bpf_iter_num_next#45181 13: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R6=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2 >> ; bpf_for(i, 0, n) { @ iters.c:1427 >> 13: (15) if r0 == 0x0 goto pc+2 ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) refs=2 >> 14: (81) r1 = *(s32 *)(r0 +0) ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) refs=2 >> 15: (ae) if w1 < w6 goto pc+4 20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=0xffffffff80000000,smax=smax32=umax32=31,umax=0xffffffff0000001f,smin32=0,var_off=(0x0; 0xffffffff0000001f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2 >> ; sum += loop_data.data[i]; @ iters.c:1429 >> 20: (67) r1 <<= 2 ; R1_w=scalar(smax=0x7ffffffc0000007c,umax=0xfffffffc0000007c,smin32=0,smax32=umax32=124,var_off=(0x0; 0xfffffffc0000007c)) refs=2 >> 21: (18) r2 = 0xffffc90000191478 ; R2_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) refs=2 >> 23: (0f) r2 += r1 >> math between map_value pointer and register with unbounded min value is not allowed >> >> The source code: >> int iter_arr_with_actual_elem_count(const void *ctx) >> { >> int i, n = loop_data.n, sum = 0; >> >> if (n > ARRAY_SIZE(loop_data.data)) >> return 0; >> >> bpf_for(i, 0, n) { >> /* no rechecking of i against ARRAY_SIZE(loop_data.n) */ >> sum += loop_data.data[i]; >> } >> >> return sum; >> } >> >> The insn #14 is a sign-extenstion load which is related to 'int i'. >> The insn #15 did a subreg comparision. Note that smin=0xffffffff80000000 and this caused later >> insn #23 failed verification due to unbounded min value. >> >> Actually insn #15 R1 smin range can be better. Before insn #15, we have >> R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) >> With the above range, we know for R1, upper 32bit can only be 0xffffffff or 0. >> Otherwise, the value range for R1 could be beyond [smin=0xffffffff80000000,smax=0x7fffffff]. >> >> After insn #15, for the true patch, we know smin32=0 and smax32=32. With the upper 32bit 0xffffffff, >> then the corresponding value is [0xffffffff00000000, 0xffffffff00000020]. The range is >> obviously beyond the original range [smin=0xffffffff80000000,smax=0x7fffffff] and the >> range is not possible. So the upper 32bit must be 0, which implies smin = smin32 and >> smax = smax32. >> >> This patch fixed the issue by adding additional register deduction after 32-bit compare > __reg_deduce_mixed_bounds() is called from reg_bounds_sync() pretty > much after every arithmetic operation or any comparison. Is the above > logic true universally or only after signed comparison? If the latter, > then we can't just do it unconditionally inside > __reg_deduce_mixed_bounds(). It is not just for signed extension. Some other arithmetic operation may produce such a range as well. > >> insn. If the signed 32-bit register range is non-negative then 64-bit smin is >> in range of [S32_MIN, S32_MAX], then the actual 64-bit smin/smax should be the same >> as 32-bit smin32/smax32. >> >> With this patch, iters/iter_arr_with_actual_elem_count succeeded with better register range: >> >> from 15 to 20: R0=rdonly_mem(id=7,ref_obj_id=2,sz=4) R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=31,var_off=(0x0; 0x1f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=scalar(id=9,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R8=scalar(id=9,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=3) refs=2 >> >> Acked-by: Eduard Zingerman <eddyz87@gmail.com> >> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> >> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> >> --- >> kernel/bpf/verifier.c | 36 ++++++++++++++++++++++++++++++++++++ >> 1 file changed, 36 insertions(+) >> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index 8da132a1ef28..46532437c4bb 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -2182,6 +2182,42 @@ static void __reg_deduce_mixed_bounds(struct bpf_reg_state *reg) >> reg->smin_value = max_t(s64, reg->smin_value, new_smin); >> reg->smax_value = min_t(s64, reg->smax_value, new_smax); >> } >> + >> + /* Here we would like to handle a special case after sign extending load, >> + * when upper bits for a 64-bit range are all 1s or all 0s. >> + * >> + * Upper bits are all 1s when register is in a range: >> + * [0xffff_ffff_0000_0000, 0xffff_ffff_ffff_ffff] >> + * Upper bits are all 0s when register is in a range: >> + * [0x0000_0000_0000_0000, 0x0000_0000_ffff_ffff] >> + * Together this forms are continuous range: >> + * [0xffff_ffff_0000_0000, 0x0000_0000_ffff_ffff] >> + * >> + * Now, suppose that register range is in fact tighter: >> + * [0xffff_ffff_8000_0000, 0x0000_0000_ffff_ffff] (R) >> + * Also suppose that it's 32-bit range is positive, >> + * meaning that lower 32-bits of the full 64-bit register >> + * are in the range: >> + * [0x0000_0000, 0x7fff_ffff] (W) >> + * >> + * If this happens, then any value in a range: >> + * [0xffff_ffff_0000_0000, 0xffff_ffff_7fff_ffff] >> + * is smaller than a lowest bound of the range (R): >> + * 0xffff_ffff_8000_0000 >> + * which means that upper bits of the full 64-bit register >> + * can't be all 1s, when lower bits are in range (W). >> + * >> + * Note that: >> + * - 0xffff_ffff_8000_0000 == (s64)S32_MIN >> + * - 0x0000_0000_ffff_ffff == (s64)S32_MAX > ?? S32_MAX = 0x7fffffff, so should the right part be U32_MAX or the > left part should be 0x0000_0000_7fff_ffff ? Will make a change in the next revision. > >> + * These relations are used in the conditions below. >> + */ >> + if (reg->s32_min_value >= 0 && reg->smin_value >= S32_MIN && reg->smax_value <= S32_MAX) { >> + reg->smin_value = reg->umin_value = reg->s32_min_value; >> + reg->smax_value = reg->umax_value = reg->s32_max_value; > let's please not mix signed and unsigned 32 -> 64 bit conversions, > they are confusing and tricky enough in each domain individually, > there is no point in mixing them Okay, will do. > >> + reg->var_off = tnum_intersect(reg->var_off, >> + tnum_range(reg->smin_value, reg->smax_value)); >> + } >> } >> >> static void __reg_deduce_bounds(struct bpf_reg_state *reg) >> -- >> 2.43.0 >> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v4 1/2] bpf: Get better reg range with ldsx and 32bit compare 2024-07-22 18:16 ` Yonghong Song @ 2024-07-24 22:56 ` Andrii Nakryiko 0 siblings, 0 replies; 9+ messages in thread From: Andrii Nakryiko @ 2024-07-24 22:56 UTC (permalink / raw) To: Yonghong Song Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team, Martin KaFai Lau, Eduard Zingerman, Shung-Hsi Yu On Mon, Jul 22, 2024 at 11:16 AM Yonghong Song <yonghong.song@linux.dev> wrote: > > > On 7/19/24 3:46 PM, Andrii Nakryiko wrote: > > On Wed, Jul 17, 2024 at 10:28 PM Yonghong Song <yonghong.song@linux.dev> wrote: > >> With latest llvm19, the selftest iters/iter_arr_with_actual_elem_count > >> failed with -mcpu=v4. > >> > >> The following are the details: > >> 0: R1=ctx() R10=fp0 > >> ; int iter_arr_with_actual_elem_count(const void *ctx) @ iters.c:1420 > >> 0: (b4) w7 = 0 ; R7_w=0 > >> ; int i, n = loop_data.n, sum = 0; @ iters.c:1422 > >> 1: (18) r1 = 0xffffc90000191478 ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) > >> 3: (61) r6 = *(u32 *)(r1 +128) ; R1_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) > >> ; if (n > ARRAY_SIZE(loop_data.data)) @ iters.c:1424 > >> 4: (26) if w6 > 0x20 goto pc+27 ; R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) > >> 5: (bf) r8 = r10 ; R8_w=fp0 R10=fp0 > >> 6: (07) r8 += -8 ; R8_w=fp-8 > >> ; bpf_for(i, 0, n) { @ iters.c:1427 > >> 7: (bf) r1 = r8 ; R1_w=fp-8 R8_w=fp-8 > >> 8: (b4) w2 = 0 ; R2_w=0 > >> 9: (bc) w3 = w6 ; R3_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R6_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) > >> 10: (85) call bpf_iter_num_new#45179 ; R0=scalar() fp-8=iter_num(ref_id=2,state=active,depth=0) refs=2 > >> 11: (bf) r1 = r8 ; R1=fp-8 R8=fp-8 refs=2 > >> 12: (85) call bpf_iter_num_next#45181 13: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R6=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2 > >> ; bpf_for(i, 0, n) { @ iters.c:1427 > >> 13: (15) if r0 == 0x0 goto pc+2 ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) refs=2 > >> 14: (81) r1 = *(s32 *)(r0 +0) ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) refs=2 > >> 15: (ae) if w1 < w6 goto pc+4 20: R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) R1=scalar(smin=0xffffffff80000000,smax=smax32=umax32=31,umax=0xffffffff0000001f,smin32=0,var_off=(0x0; 0xffffffff0000001f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=0 R8=fp-8 R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) refs=2 > >> ; sum += loop_data.data[i]; @ iters.c:1429 > >> 20: (67) r1 <<= 2 ; R1_w=scalar(smax=0x7ffffffc0000007c,umax=0xfffffffc0000007c,smin32=0,smax32=umax32=124,var_off=(0x0; 0xfffffffc0000007c)) refs=2 > >> 21: (18) r2 = 0xffffc90000191478 ; R2_w=map_value(map=iters.bss,ks=4,vs=1280,off=1144) refs=2 > >> 23: (0f) r2 += r1 > >> math between map_value pointer and register with unbounded min value is not allowed > >> > >> The source code: > >> int iter_arr_with_actual_elem_count(const void *ctx) > >> { > >> int i, n = loop_data.n, sum = 0; > >> > >> if (n > ARRAY_SIZE(loop_data.data)) > >> return 0; > >> > >> bpf_for(i, 0, n) { > >> /* no rechecking of i against ARRAY_SIZE(loop_data.n) */ > >> sum += loop_data.data[i]; > >> } > >> > >> return sum; > >> } > >> > >> The insn #14 is a sign-extenstion load which is related to 'int i'. > >> The insn #15 did a subreg comparision. Note that smin=0xffffffff80000000 and this caused later > >> insn #23 failed verification due to unbounded min value. > >> > >> Actually insn #15 R1 smin range can be better. Before insn #15, we have > >> R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) > >> With the above range, we know for R1, upper 32bit can only be 0xffffffff or 0. > >> Otherwise, the value range for R1 could be beyond [smin=0xffffffff80000000,smax=0x7fffffff]. > >> > >> After insn #15, for the true patch, we know smin32=0 and smax32=32. With the upper 32bit 0xffffffff, > >> then the corresponding value is [0xffffffff00000000, 0xffffffff00000020]. The range is > >> obviously beyond the original range [smin=0xffffffff80000000,smax=0x7fffffff] and the > >> range is not possible. So the upper 32bit must be 0, which implies smin = smin32 and > >> smax = smax32. > >> > >> This patch fixed the issue by adding additional register deduction after 32-bit compare > > __reg_deduce_mixed_bounds() is called from reg_bounds_sync() pretty > > much after every arithmetic operation or any comparison. Is the above > > logic true universally or only after signed comparison? If the latter, > > then we can't just do it unconditionally inside > > __reg_deduce_mixed_bounds(). > > It is not just for signed extension. Some other arithmetic operation may > produce such a range as well. Agreed. It took me a bit to grok this more intuitively, but I think I got there. :) > > > > >> insn. If the signed 32-bit register range is non-negative then 64-bit smin is > >> in range of [S32_MIN, S32_MAX], then the actual 64-bit smin/smax should be the same > >> as 32-bit smin32/smax32. > >> > >> With this patch, iters/iter_arr_with_actual_elem_count succeeded with better register range: > >> > >> from 15 to 20: R0=rdonly_mem(id=7,ref_obj_id=2,sz=4) R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=31,var_off=(0x0; 0x1f)) R6=scalar(id=1,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) R7=scalar(id=9,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R8=scalar(id=9,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=3) refs=2 > >> > >> Acked-by: Eduard Zingerman <eddyz87@gmail.com> > >> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> > >> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> > >> --- > >> kernel/bpf/verifier.c | 36 ++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 36 insertions(+) > >> > >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > >> index 8da132a1ef28..46532437c4bb 100644 > >> --- a/kernel/bpf/verifier.c > >> +++ b/kernel/bpf/verifier.c > >> @@ -2182,6 +2182,42 @@ static void __reg_deduce_mixed_bounds(struct bpf_reg_state *reg) > >> reg->smin_value = max_t(s64, reg->smin_value, new_smin); > >> reg->smax_value = min_t(s64, reg->smax_value, new_smax); > >> } > >> + > >> + /* Here we would like to handle a special case after sign extending load, > >> + * when upper bits for a 64-bit range are all 1s or all 0s. > >> + * > >> + * Upper bits are all 1s when register is in a range: > >> + * [0xffff_ffff_0000_0000, 0xffff_ffff_ffff_ffff] > >> + * Upper bits are all 0s when register is in a range: > >> + * [0x0000_0000_0000_0000, 0x0000_0000_ffff_ffff] > >> + * Together this forms are continuous range: > >> + * [0xffff_ffff_0000_0000, 0x0000_0000_ffff_ffff] > >> + * > >> + * Now, suppose that register range is in fact tighter: > >> + * [0xffff_ffff_8000_0000, 0x0000_0000_ffff_ffff] (R) > >> + * Also suppose that it's 32-bit range is positive, > >> + * meaning that lower 32-bits of the full 64-bit register > >> + * are in the range: > >> + * [0x0000_0000, 0x7fff_ffff] (W) > >> + * > >> + * If this happens, then any value in a range: > >> + * [0xffff_ffff_0000_0000, 0xffff_ffff_7fff_ffff] > >> + * is smaller than a lowest bound of the range (R): > >> + * 0xffff_ffff_8000_0000 > >> + * which means that upper bits of the full 64-bit register > >> + * can't be all 1s, when lower bits are in range (W). > >> + * > >> + * Note that: > >> + * - 0xffff_ffff_8000_0000 == (s64)S32_MIN > >> + * - 0x0000_0000_ffff_ffff == (s64)S32_MAX > > ?? S32_MAX = 0x7fffffff, so should the right part be U32_MAX or the > > left part should be 0x0000_0000_7fff_ffff ? > Will make a change in the next revision. > > > >> + * These relations are used in the conditions below. > >> + */ > >> + if (reg->s32_min_value >= 0 && reg->smin_value >= S32_MIN && reg->smax_value <= S32_MAX) { > >> + reg->smin_value = reg->umin_value = reg->s32_min_value; > >> + reg->smax_value = reg->umax_value = reg->s32_max_value; > > let's please not mix signed and unsigned 32 -> 64 bit conversions, > > they are confusing and tricky enough in each domain individually, > > there is no point in mixing them > Okay, will do. > > > >> + reg->var_off = tnum_intersect(reg->var_off, > >> + tnum_range(reg->smin_value, reg->smax_value)); > >> + } > >> } > >> > >> static void __reg_deduce_bounds(struct bpf_reg_state *reg) > >> -- > >> 2.43.0 > >> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-07-24 22:56 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-07-18 5:28 [PATCH bpf-next v4 1/2] bpf: Get better reg range with ldsx and 32bit compare Yonghong Song 2024-07-18 5:28 ` [PATCH bpf-next v4 2/2] selftests/bpf: Add reg_bounds tests for ldsx and subreg compare Yonghong Song 2024-07-18 20:48 ` Eduard Zingerman 2024-07-19 22:58 ` Andrii Nakryiko 2024-07-22 18:11 ` Yonghong Song 2024-07-19 22:46 ` [PATCH bpf-next v4 1/2] bpf: Get better reg range with ldsx and 32bit compare Andrii Nakryiko 2024-07-19 23:40 ` Eduard Zingerman 2024-07-22 18:16 ` Yonghong Song 2024-07-24 22:56 ` Andrii Nakryiko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox