BPF List
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
@ 2024-01-03 23:26 Yonghong Song
  2024-01-03 23:26 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add a selftest with not-8-byte aligned BPF_ST Yonghong Song
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Yonghong Song @ 2024-01-03 23:26 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau, Kuniyuki Iwashima, Martin KaFai Lau

With patch set [1], precision backtracing supports register spill/fill
to/from the stack. The patch [2] allows initial imprecise register spill
with content 0. This is a common case for cpuv3 and lower for
initializing the stack variables with pattern
  r1 = 0
  *(u64 *)(r10 - 8) = r1
and the [2] has demonstrated good verification improvement.

For cpuv4, the initialization could be
  *(u64 *)(r10 - 8) = 0
The current verifier marks the r10-8 contents with STACK_ZERO.
Similar to [2], let us permit the above insn to behave like
imprecise register spill which can reduce number of verified states.
The change is in function check_stack_write_fixed_off().

Before this patch, spilled zero will be marked as STACK_ZERO
which can provide precise values. In check_stack_write_var_off(),
STACK_ZERO will be maintained if writing a const zero
so later it can provide precise values if needed.

The above handling of '*(u64 *)(r10 - 8) = 0' as a spill
will have issues in check_stack_write_var_off() as the spill
will be converted to STACK_MISC and the precise value 0
is lost. To fix this issue, if the spill slots with const
zero and the BPF_ST write also with const zero, the spill slots
are preserved, which can later provide precise values
if needed. Without the change in check_stack_write_var_off(),
the test_verifier subtest 'BPF_ST_MEM stack imm zero, variable offset'
will fail.

I checked cpuv3 and cpuv4 with and without this patch with veristat.
There is no state change for cpuv3 since '*(u64 *)(r10 - 8) = 0'
is only generated with cpuv4.

For cpuv4:
$ ../veristat -C old.cpuv4.csv new.cpuv4.csv -e file,prog,insns,states -f 'insns_diff!=0'
File                                        Program              Insns (A)  Insns (B)  Insns    (DIFF)  States (A)  States (B)  States (DIFF)
------------------------------------------  -------------------  ---------  ---------  ---------------  ----------  ----------  -------------
local_storage_bench.bpf.linked3.o           get_local                  228        168    -60 (-26.32%)          17          14   -3 (-17.65%)
pyperf600_bpf_loop.bpf.linked3.o            on_event                  6066       4889  -1177 (-19.40%)         403         321  -82 (-20.35%)
test_cls_redirect.bpf.linked3.o             cls_redirect             35483      35387     -96 (-0.27%)        2179        2177    -2 (-0.09%)
test_l4lb_noinline.bpf.linked3.o            balancer_ingress          4494       4522     +28 (+0.62%)         217         219    +2 (+0.92%)
test_l4lb_noinline_dynptr.bpf.linked3.o     balancer_ingress          1432       1455     +23 (+1.61%)          92          94    +2 (+2.17%)
test_xdp_noinline.bpf.linked3.o             balancer_ingress_v6       3462       3458      -4 (-0.12%)         216         216    +0 (+0.00%)
verifier_iterating_callbacks.bpf.linked3.o  widening                    52         41    -11 (-21.15%)           4           3   -1 (-25.00%)
xdp_synproxy_kern.bpf.linked3.o             syncookie_tc             12412      11719    -693 (-5.58%)         345         330   -15 (-4.35%)
xdp_synproxy_kern.bpf.linked3.o             syncookie_xdp            12478      11794    -684 (-5.48%)         346         331   -15 (-4.34%)

test_l4lb_noinline and test_l4lb_noinline_dynptr has minor regression, but
pyperf600_bpf_loop and local_storage_bench gets pretty good improvement.

  [1] https://lore.kernel.org/all/20231205184248.1502704-1-andrii@kernel.org/
  [2] https://lore.kernel.org/all/20231205184248.1502704-9-andrii@kernel.org/

Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 kernel/bpf/verifier.c                         | 21 +++++++++++++++++--
 .../selftests/bpf/progs/verifier_spill_fill.c | 16 +++++++-------
 2 files changed, 27 insertions(+), 10 deletions(-)

Changelogs:
  v1 -> v2:
    - Preserve with-const-zero spill if writing is also zero
      in check_stack_write_var_off().
    - Add a test with not-8-byte-aligned BPF_ST store.

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d4e31f61de0e..cfe7a68d90a5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4491,7 +4491,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 		if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
 			state->stack[spi].spilled_ptr.id = 0;
 	} else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) &&
-		   insn->imm != 0 && env->bpf_capable) {
+		   env->bpf_capable) {
 		struct bpf_reg_state fake_reg = {};
 
 		__mark_reg_known(&fake_reg, insn->imm);
@@ -4613,11 +4613,28 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
 
 	/* Variable offset writes destroy any spilled pointers in range. */
 	for (i = min_off; i < max_off; i++) {
+		struct bpf_reg_state *spill_reg;
 		u8 new_type, *stype;
-		int slot, spi;
+		int slot, spi, j;
 
 		slot = -i - 1;
 		spi = slot / BPF_REG_SIZE;
+
+		/* If writing_zero and the the spi slot contains a spill of value 0,
+		 * maintain the spill type.
+		 */
+		if (writing_zero && !(i % BPF_REG_SIZE) && is_spilled_scalar_reg(&state->stack[spi])) {
+			spill_reg = &state->stack[spi].spilled_ptr;
+			if (tnum_is_const(spill_reg->var_off) && spill_reg->var_off.value == 0) {
+				for (j = BPF_REG_SIZE; j > 0; j--) {
+					if (state->stack[spi].slot_type[j - 1] != STACK_SPILL)
+						break;
+				}
+				i += BPF_REG_SIZE - j - 1;
+				continue;
+			}
+		}
+
 		stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
 		mark_stack_slot_scratched(env, spi);
 
diff --git a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
index 39fe3372e0e0..d4b3188afe07 100644
--- a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
+++ b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
@@ -495,14 +495,14 @@ char single_byte_buf[1] SEC(".data.single_byte_buf");
 SEC("raw_tp")
 __log_level(2)
 __success
-/* make sure fp-8 is all STACK_ZERO */
-__msg("2: (7a) *(u64 *)(r10 -8) = 0          ; R10=fp0 fp-8_w=00000000")
+/* fp-8 is spilled IMPRECISE value zero (represented by a zero value fake reg) */
+__msg("2: (7a) *(u64 *)(r10 -8) = 0          ; R10=fp0 fp-8_w=0")
 /* but fp-16 is spilled IMPRECISE zero const reg */
 __msg("4: (7b) *(u64 *)(r10 -16) = r0        ; R0_w=0 R10=fp0 fp-16_w=0")
-/* validate that assigning R2 from STACK_ZERO doesn't mark register
+/* validate that assigning R2 from STACK_SPILL with zero value  doesn't mark register
  * precise immediately; if necessary, it will be marked precise later
  */
-__msg("6: (71) r2 = *(u8 *)(r10 -1)          ; R2_w=0 R10=fp0 fp-8_w=00000000")
+__msg("6: (71) r2 = *(u8 *)(r10 -1)          ; R2_w=0 R10=fp0 fp-8_w=0")
 /* similarly, when R2 is assigned from spilled register, it is initially
  * imprecise, but will be marked precise later once it is used in precise context
  */
@@ -520,14 +520,14 @@ __msg("mark_precise: frame0: regs=r0 stack= before 3: (b7) r0 = 0")
 __naked void partial_stack_load_preserves_zeros(void)
 {
 	asm volatile (
-		/* fp-8 is all STACK_ZERO */
+		/* fp-8 is value zero (represented by a zero value fake reg) */
 		".8byte %[fp8_st_zero];" /* LLVM-18+: *(u64 *)(r10 -8) = 0; */
 
 		/* fp-16 is const zero register */
 		"r0 = 0;"
 		"*(u64 *)(r10 -16) = r0;"
 
-		/* load single U8 from non-aligned STACK_ZERO slot */
+		/* load single U8 from non-aligned spilled value zero slot */
 		"r1 = %[single_byte_buf];"
 		"r2 = *(u8 *)(r10 -1);"
 		"r1 += r2;"
@@ -539,7 +539,7 @@ __naked void partial_stack_load_preserves_zeros(void)
 		"r1 += r2;"
 		"*(u8 *)(r1 + 0) = r2;" /* this should be fine */
 
-		/* load single U16 from non-aligned STACK_ZERO slot */
+		/* load single U16 from non-aligned spilled value zero slot */
 		"r1 = %[single_byte_buf];"
 		"r2 = *(u16 *)(r10 -2);"
 		"r1 += r2;"
@@ -551,7 +551,7 @@ __naked void partial_stack_load_preserves_zeros(void)
 		"r1 += r2;"
 		"*(u8 *)(r1 + 0) = r2;" /* this should be fine */
 
-		/* load single U32 from non-aligned STACK_ZERO slot */
+		/* load single U32 from non-aligned spilled value zero slot */
 		"r1 = %[single_byte_buf];"
 		"r2 = *(u32 *)(r10 -4);"
 		"r1 += r2;"
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v2 2/2] selftests/bpf: Add a selftest with not-8-byte aligned BPF_ST
  2024-01-03 23:26 [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers Yonghong Song
@ 2024-01-03 23:26 ` Yonghong Song
  2024-01-04 16:37 ` [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers Eduard Zingerman
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2024-01-03 23:26 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

Add a selftest with a 4 bytes BPF_ST of 0 where the store is not
8-byte aligned. The goal is to ensure that STACK_ZERO is properly
marked for the spill and the STACK_ZERO value can propagate
properly during the load.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../selftests/bpf/progs/verifier_spill_fill.c | 44 +++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
index d4b3188afe07..6017b26d957d 100644
--- a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
+++ b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
@@ -583,6 +583,50 @@ __naked void partial_stack_load_preserves_zeros(void)
 	: __clobber_common);
 }
 
+SEC("raw_tp")
+__log_level(2)
+__success
+/* fp-4 is STACK_ZERO */
+__msg("2: (62) *(u32 *)(r10 -4) = 0          ; R10=fp0 fp-8=0000????")
+/* validate that assigning R2 from STACK_ZERO with zero value doesn't mark register
+ * precise immediately; if necessary, it will be marked precise later
+ */
+__msg("4: (71) r2 = *(u8 *)(r10 -1)          ; R2_w=0 R10=fp0 fp-8=0000????")
+__msg("5: (0f) r1 += r2")
+__msg("mark_precise: frame0: last_idx 5 first_idx 0 subseq_idx -1")
+__msg("mark_precise: frame0: regs=r2 stack= before 4: (71) r2 = *(u8 *)(r10 -1)")
+__naked void partial_stack_load_preserves_partial_zeros(void)
+{
+	asm volatile (
+		/* fp-4 is value zero */
+		".8byte %[fp4_st_zero];" /* LLVM-18+: *(u32 *)(r10 -4) = 0; */
+
+		/* load single U8 from non-aligned stack zero slot */
+		"r1 = %[single_byte_buf];"
+		"r2 = *(u8 *)(r10 -1);"
+		"r1 += r2;"
+		"*(u8 *)(r1 + 0) = r2;" /* this should be fine */
+
+		/* load single U16 from non-aligned stack zero slot */
+		"r1 = %[single_byte_buf];"
+		"r2 = *(u16 *)(r10 -2);"
+		"r1 += r2;"
+		"*(u8 *)(r1 + 0) = r2;" /* this should be fine */
+
+		/* load single U32 from non-aligned stack zero slot */
+		"r1 = %[single_byte_buf];"
+		"r2 = *(u32 *)(r10 -4);"
+		"r1 += r2;"
+		"*(u8 *)(r1 + 0) = r2;" /* this should be fine */
+
+		"r0 = 0;"
+		"exit;"
+	:
+	: __imm_ptr(single_byte_buf),
+	  __imm_insn(fp4_st_zero, BPF_ST_MEM(BPF_W, BPF_REG_FP, -4, 0))
+	: __clobber_common);
+}
+
 char two_byte_buf[2] SEC(".data.two_byte_buf");
 
 SEC("raw_tp")
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-03 23:26 [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers Yonghong Song
  2024-01-03 23:26 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add a selftest with not-8-byte aligned BPF_ST Yonghong Song
@ 2024-01-04 16:37 ` Eduard Zingerman
  2024-01-04 17:13   ` Yonghong Song
  2024-01-04 18:30 ` Eduard Zingerman
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Eduard Zingerman @ 2024-01-04 16:37 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau, Kuniyuki Iwashima, Martin KaFai Lau

On Wed, 2024-01-03 at 15:26 -0800, Yonghong Song wrote:
> With patch set [1], precision backtracing supports register spill/fill
> to/from the stack. The patch [2] allows initial imprecise register spill
> with content 0. This is a common case for cpuv3 and lower for
> initializing the stack variables with pattern
>   r1 = 0
>   *(u64 *)(r10 - 8) = r1
> and the [2] has demonstrated good verification improvement.
> 
> For cpuv4, the initialization could be
>   *(u64 *)(r10 - 8) = 0
> The current verifier marks the r10-8 contents with STACK_ZERO.
> Similar to [2], let us permit the above insn to behave like
> imprecise register spill which can reduce number of verified states.
> The change is in function check_stack_write_fixed_off().

Hi Yonghong,

I agree with this change, but I don't understand under which conditions
current STACK_ZERO logic is sub-optimal.
I tried executing test case from patch #2 w/o applying patch #1 and it passes.
Could you please elaborate / conjure a test case that would fail w/o patch #1?

Thanks,
Eduard

[...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-04 16:37 ` [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers Eduard Zingerman
@ 2024-01-04 17:13   ` Yonghong Song
  2024-01-04 18:43     ` Eduard Zingerman
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2024-01-04 17:13 UTC (permalink / raw)
  To: Eduard Zingerman, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau, Kuniyuki Iwashima, Martin KaFai Lau


On 1/4/24 8:37 AM, Eduard Zingerman wrote:
> On Wed, 2024-01-03 at 15:26 -0800, Yonghong Song wrote:
>> With patch set [1], precision backtracing supports register spill/fill
>> to/from the stack. The patch [2] allows initial imprecise register spill
>> with content 0. This is a common case for cpuv3 and lower for
>> initializing the stack variables with pattern
>>    r1 = 0
>>    *(u64 *)(r10 - 8) = r1
>> and the [2] has demonstrated good verification improvement.
>>
>> For cpuv4, the initialization could be
>>    *(u64 *)(r10 - 8) = 0
>> The current verifier marks the r10-8 contents with STACK_ZERO.
>> Similar to [2], let us permit the above insn to behave like
>> imprecise register spill which can reduce number of verified states.
>> The change is in function check_stack_write_fixed_off().
> Hi Yonghong,
>
> I agree with this change, but I don't understand under which conditions
> current STACK_ZERO logic is sub-optimal.
> I tried executing test case from patch #2 w/o applying patch #1 and it passes.
> Could you please elaborate / conjure a test case that would fail w/o patch #1?

The logic is similar to
   https://lore.kernel.org/all/20231205184248.1502704-9-andrii@kernel.org/

STACK_ZERO logic is sub-optimal in some cases only w.r.t. the number of
verifier states. So there is no correctness issue.

Patch 2 is added in response to Andrii's request in
   https://lore.kernel.org/all/CAEf4BzaWets3fHUGtctwCNWecR9ASRCO2kFagNy8jJZmPBWYDA@mail.gmail.com/
Since with patch 1 the original STACK_ZERO case is converted to STACK_SPILL,
Patch 2 is added to cover STACK_ZERO case. So with or with patch 1, patch 2
will succeed since it uses STACK_ZERO logic.


>
> Thanks,
> Eduard
>
> [...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-03 23:26 [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers Yonghong Song
  2024-01-03 23:26 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add a selftest with not-8-byte aligned BPF_ST Yonghong Song
  2024-01-04 16:37 ` [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers Eduard Zingerman
@ 2024-01-04 18:30 ` Eduard Zingerman
  2024-01-04 20:12   ` Yonghong Song
  2024-01-04 23:03 ` Andrii Nakryiko
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Eduard Zingerman @ 2024-01-04 18:30 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau, Kuniyuki Iwashima, Martin KaFai Lau

On Wed, 2024-01-03 at 15:26 -0800, Yonghong Song wrote:

I missed one thing while looking at this patch, please see below.

[...]

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d4e31f61de0e..cfe7a68d90a5 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4491,7 +4491,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>  		if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
>  			state->stack[spi].spilled_ptr.id = 0;
>  	} else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) &&
> -		   insn->imm != 0 && env->bpf_capable) {
> +		   env->bpf_capable) {
>  		struct bpf_reg_state fake_reg = {};
>  
>  		__mark_reg_known(&fake_reg, insn->imm);
> @@ -4613,11 +4613,28 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
>  
>  	/* Variable offset writes destroy any spilled pointers in range. */
>  	for (i = min_off; i < max_off; i++) {
> +		struct bpf_reg_state *spill_reg;
>  		u8 new_type, *stype;
> -		int slot, spi;
> +		int slot, spi, j;
>  
>  		slot = -i - 1;
>  		spi = slot / BPF_REG_SIZE;
> +
> +		/* If writing_zero and the the spi slot contains a spill of value 0,
> +		 * maintain the spill type.
> +		 */
> +		if (writing_zero && !(i % BPF_REG_SIZE) && is_spilled_scalar_reg(&state->stack[spi])) {

Talked to Andrii today, and he noted that spilled reg should be marked
precise at this point.

> +			spill_reg = &state->stack[spi].spilled_ptr;
> +			if (tnum_is_const(spill_reg->var_off) && spill_reg->var_off.value == 0) {
> +				for (j = BPF_REG_SIZE; j > 0; j--) {
> +					if (state->stack[spi].slot_type[j - 1] != STACK_SPILL)
> +						break;
> +				}
> +				i += BPF_REG_SIZE - j - 1;
> +				continue;
> +			}
> +		}
> +
>  		stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
>  		mark_stack_slot_scratched(env, spi);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-04 17:13   ` Yonghong Song
@ 2024-01-04 18:43     ` Eduard Zingerman
  0 siblings, 0 replies; 26+ messages in thread
From: Eduard Zingerman @ 2024-01-04 18:43 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau, Kuniyuki Iwashima, Martin KaFai Lau

On Thu, 2024-01-04 at 09:13 -0800, Yonghong Song wrote:
> On 1/4/24 8:37 AM, Eduard Zingerman wrote:

[...]

> > I agree with this change, but I don't understand under which conditions
> > current STACK_ZERO logic is sub-optimal.
> > I tried executing test case from patch #2 w/o applying patch #1 and it passes.
> > Could you please elaborate / conjure a test case that would fail w/o patch #1?
> 
> The logic is similar to
>    https://lore.kernel.org/all/20231205184248.1502704-9-andrii@kernel.org/
> 
> STACK_ZERO logic is sub-optimal in some cases only w.r.t. the number of
> verifier states. So there is no correctness issue.
> Patch 2 is added in response to Andrii's request in
>    https://lore.kernel.org/all/CAEf4BzaWets3fHUGtctwCNWecR9ASRCO2kFagNy8jJZmPBWYDA@mail.gmail.com/
> Since with patch 1 the original STACK_ZERO case is converted to STACK_SPILL,
> Patch 2 is added to cover STACK_ZERO case. So with or with patch 1, patch 2
> will succeed since it uses STACK_ZERO logic.

Understood, thank you.
Probably no need to add more tests then.

A patch [1] might be related, it handles STACK_ZERO vs zero spill vs
unbound scalar spill on regsafe side.

[1] https://lore.kernel.org/bpf/20231220214013.3327288-15-maxtram95@gmail.com/


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-04 18:30 ` Eduard Zingerman
@ 2024-01-04 20:12   ` Yonghong Song
  2024-01-04 21:10     ` Eduard Zingerman
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2024-01-04 20:12 UTC (permalink / raw)
  To: Eduard Zingerman, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau, Kuniyuki Iwashima, Martin KaFai Lau


On 1/4/24 10:30 AM, Eduard Zingerman wrote:
> On Wed, 2024-01-03 at 15:26 -0800, Yonghong Song wrote:
>
> I missed one thing while looking at this patch, please see below.
>
> [...]
>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index d4e31f61de0e..cfe7a68d90a5 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -4491,7 +4491,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>>   		if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
>>   			state->stack[spi].spilled_ptr.id = 0;
>>   	} else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) &&
>> -		   insn->imm != 0 && env->bpf_capable) {
>> +		   env->bpf_capable) {
>>   		struct bpf_reg_state fake_reg = {};
>>   
>>   		__mark_reg_known(&fake_reg, insn->imm);
>> @@ -4613,11 +4613,28 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
>>   
>>   	/* Variable offset writes destroy any spilled pointers in range. */
>>   	for (i = min_off; i < max_off; i++) {
>> +		struct bpf_reg_state *spill_reg;
>>   		u8 new_type, *stype;
>> -		int slot, spi;
>> +		int slot, spi, j;
>>   
>>   		slot = -i - 1;
>>   		spi = slot / BPF_REG_SIZE;
>> +
>> +		/* If writing_zero and the the spi slot contains a spill of value 0,
>> +		 * maintain the spill type.
>> +		 */
>> +		if (writing_zero && !(i % BPF_REG_SIZE) && is_spilled_scalar_reg(&state->stack[spi])) {
> Talked to Andrii today, and he noted that spilled reg should be marked
> precise at this point.

Could you help explain why?

Looks we did not mark reg as precise with fixed offset as below:

         if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && env->bpf_capable) {
                 save_register_state(env, state, spi, reg, size);
                 /* Break the relation on a narrowing spill. */
                 if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
                         state->stack[spi].spilled_ptr.id = 0;
         } else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) &&
                    insn->imm != 0 && env->bpf_capable) {

I probably missed something about precision tracking...

>
>> +			spill_reg = &state->stack[spi].spilled_ptr;
>> +			if (tnum_is_const(spill_reg->var_off) && spill_reg->var_off.value == 0) {
>> +				for (j = BPF_REG_SIZE; j > 0; j--) {
>> +					if (state->stack[spi].slot_type[j - 1] != STACK_SPILL)
>> +						break;
>> +				}
>> +				i += BPF_REG_SIZE - j - 1;
>> +				continue;
>> +			}
>> +		}
>> +
>>   		stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
>>   		mark_stack_slot_scratched(env, spi);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-04 20:12   ` Yonghong Song
@ 2024-01-04 21:10     ` Eduard Zingerman
  2024-01-04 23:09       ` Andrii Nakryiko
  0 siblings, 1 reply; 26+ messages in thread
From: Eduard Zingerman @ 2024-01-04 21:10 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau, Kuniyuki Iwashima, Martin KaFai Lau

On Thu, 2024-01-04 at 12:12 -0800, Yonghong Song wrote:
[...]
> > > @@ -4613,11 +4613,28 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
> > >   
> > >   	/* Variable offset writes destroy any spilled pointers in range. */
> > >   	for (i = min_off; i < max_off; i++) {
> > > +		struct bpf_reg_state *spill_reg;
> > >   		u8 new_type, *stype;
> > > -		int slot, spi;
> > > +		int slot, spi, j;
> > >   
> > >   		slot = -i - 1;
> > >   		spi = slot / BPF_REG_SIZE;
> > > +
> > > +		/* If writing_zero and the the spi slot contains a spill of value 0,
> > > +		 * maintain the spill type.
> > > +		 */
> > > +		if (writing_zero && !(i % BPF_REG_SIZE) && is_spilled_scalar_reg(&state->stack[spi])) {
> > Talked to Andrii today, and he noted that spilled reg should be marked
> > precise at this point.
> 
> Could you help explain why?
> 
> Looks we did not mark reg as precise with fixed offset as below:
> 
>          if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && env->bpf_capable) {
>                  save_register_state(env, state, spi, reg, size);
>                  /* Break the relation on a narrowing spill. */
>                  if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
>                          state->stack[spi].spilled_ptr.id = 0;
>          } else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) &&
>                     insn->imm != 0 && env->bpf_capable) {
> 
> I probably missed something about precision tracking...

The discussed justification was that if verifier assumes something
about the value of scalar (in this case that it is 0) such scalar
should be marked precise (e.g. this is done for value_regno in
check_stack_write_var_off()).

This seemed logical at the time of discussion, however, I can't figure
a counter example at the moment. It appears that whatever are
assumptions in check_stack_write_var_off() if spill is used in the
precise context it would be marked eventually.
E.g. the following is correctly rejected:

SEC("raw_tp")
__log_level(2) __flag(BPF_F_TEST_STATE_FREQ)
__failure
__naked void var_stack_1(void)
{
	asm volatile (
		"call %[bpf_get_prandom_u32];"
		"r9 = 100500;"
		"if r0 > 42 goto +1;"
		"r9 = 0;"
		"*(u64 *)(r10 - 16) = r9;"
		"call %[bpf_get_prandom_u32];"
		"r0 &= 0xf;"
		"r1 = -1;"
		"r1 -= r0;"
		"r2 = r10;"
		"r2 += r1;"
		"r0 = 0;"
		"*(u8 *)(r2 + 0) = r0;"
		"r1 = %[two_byte_buf];"
		"r2 = *(u32 *)(r10 -16);"
		"r1 += r2;"
		"*(u8 *)(r1 + 0) = r2;" /* this should not be fine */
		"exit;"
	:
	: __imm_ptr(two_byte_buf),
	  __imm(bpf_get_prandom_u32)
	: __clobber_common);
}

So now I'm not sure :(
Sorry for too much noise.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-03 23:26 [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers Yonghong Song
                   ` (2 preceding siblings ...)
  2024-01-04 18:30 ` Eduard Zingerman
@ 2024-01-04 23:03 ` Andrii Nakryiko
  2024-01-05  0:49 ` Andrii Nakryiko
  2024-01-08 23:23 ` Andrii Nakryiko
  5 siblings, 0 replies; 26+ messages in thread
From: Andrii Nakryiko @ 2024-01-04 23:03 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau

On Wed, Jan 3, 2024 at 3:26 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> With patch set [1], precision backtracing supports register spill/fill
> to/from the stack. The patch [2] allows initial imprecise register spill
> with content 0. This is a common case for cpuv3 and lower for
> initializing the stack variables with pattern
>   r1 = 0
>   *(u64 *)(r10 - 8) = r1
> and the [2] has demonstrated good verification improvement.
>
> For cpuv4, the initialization could be
>   *(u64 *)(r10 - 8) = 0
> The current verifier marks the r10-8 contents with STACK_ZERO.
> Similar to [2], let us permit the above insn to behave like
> imprecise register spill which can reduce number of verified states.
> The change is in function check_stack_write_fixed_off().
>
> Before this patch, spilled zero will be marked as STACK_ZERO
> which can provide precise values. In check_stack_write_var_off(),
> STACK_ZERO will be maintained if writing a const zero
> so later it can provide precise values if needed.
>
> The above handling of '*(u64 *)(r10 - 8) = 0' as a spill
> will have issues in check_stack_write_var_off() as the spill
> will be converted to STACK_MISC and the precise value 0
> is lost. To fix this issue, if the spill slots with const
> zero and the BPF_ST write also with const zero, the spill slots
> are preserved, which can later provide precise values
> if needed. Without the change in check_stack_write_var_off(),
> the test_verifier subtest 'BPF_ST_MEM stack imm zero, variable offset'
> will fail.
>
> I checked cpuv3 and cpuv4 with and without this patch with veristat.
> There is no state change for cpuv3 since '*(u64 *)(r10 - 8) = 0'
> is only generated with cpuv4.
>
> For cpuv4:
> $ ../veristat -C old.cpuv4.csv new.cpuv4.csv -e file,prog,insns,states -f 'insns_diff!=0'
> File                                        Program              Insns (A)  Insns (B)  Insns    (DIFF)  States (A)  States (B)  States (DIFF)
> ------------------------------------------  -------------------  ---------  ---------  ---------------  ----------  ----------  -------------
> local_storage_bench.bpf.linked3.o           get_local                  228        168    -60 (-26.32%)          17          14   -3 (-17.65%)
> pyperf600_bpf_loop.bpf.linked3.o            on_event                  6066       4889  -1177 (-19.40%)         403         321  -82 (-20.35%)
> test_cls_redirect.bpf.linked3.o             cls_redirect             35483      35387     -96 (-0.27%)        2179        2177    -2 (-0.09%)
> test_l4lb_noinline.bpf.linked3.o            balancer_ingress          4494       4522     +28 (+0.62%)         217         219    +2 (+0.92%)
> test_l4lb_noinline_dynptr.bpf.linked3.o     balancer_ingress          1432       1455     +23 (+1.61%)          92          94    +2 (+2.17%)
> test_xdp_noinline.bpf.linked3.o             balancer_ingress_v6       3462       3458      -4 (-0.12%)         216         216    +0 (+0.00%)
> verifier_iterating_callbacks.bpf.linked3.o  widening                    52         41    -11 (-21.15%)           4           3   -1 (-25.00%)
> xdp_synproxy_kern.bpf.linked3.o             syncookie_tc             12412      11719    -693 (-5.58%)         345         330   -15 (-4.35%)
> xdp_synproxy_kern.bpf.linked3.o             syncookie_xdp            12478      11794    -684 (-5.48%)         346         331   -15 (-4.34%)
>
> test_l4lb_noinline and test_l4lb_noinline_dynptr has minor regression, but
> pyperf600_bpf_loop and local_storage_bench gets pretty good improvement.
>
>   [1] https://lore.kernel.org/all/20231205184248.1502704-1-andrii@kernel.org/
>   [2] https://lore.kernel.org/all/20231205184248.1502704-9-andrii@kernel.org/
>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> Cc: Martin KaFai Lau <kafai@fb.com>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  kernel/bpf/verifier.c                         | 21 +++++++++++++++++--
>  .../selftests/bpf/progs/verifier_spill_fill.c | 16 +++++++-------
>  2 files changed, 27 insertions(+), 10 deletions(-)
>
> Changelogs:
>   v1 -> v2:
>     - Preserve with-const-zero spill if writing is also zero
>       in check_stack_write_var_off().
>     - Add a test with not-8-byte-aligned BPF_ST store.
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d4e31f61de0e..cfe7a68d90a5 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4491,7 +4491,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>                 if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
>                         state->stack[spi].spilled_ptr.id = 0;
>         } else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) &&
> -                  insn->imm != 0 && env->bpf_capable) {
> +                  env->bpf_capable) {
>                 struct bpf_reg_state fake_reg = {};
>
>                 __mark_reg_known(&fake_reg, insn->imm);
> @@ -4613,11 +4613,28 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
>
>         /* Variable offset writes destroy any spilled pointers in range. */
>         for (i = min_off; i < max_off; i++) {
> +               struct bpf_reg_state *spill_reg;
>                 u8 new_type, *stype;
> -               int slot, spi;
> +               int slot, spi, j;
>
>                 slot = -i - 1;
>                 spi = slot / BPF_REG_SIZE;
> +
> +               /* If writing_zero and the the spi slot contains a spill of value 0,
> +                * maintain the spill type.
> +                */
> +               if (writing_zero && !(i % BPF_REG_SIZE) && is_spilled_scalar_reg(&state->stack[spi])) {
> +                       spill_reg = &state->stack[spi].spilled_ptr;
> +                       if (tnum_is_const(spill_reg->var_off) && spill_reg->var_off.value == 0) {

here we assume that *spilled* register is zero and will stay zero,
even if it's imprecise. This is wrong, because imprecise SCALAR 0 is
actually an unknown scalar when doing state pruning. So we need to
either force the spilled register to be precise, or overwrite it with
STACK_MISC.


> +                               for (j = BPF_REG_SIZE; j > 0; j--) {
> +                                       if (state->stack[spi].slot_type[j - 1] != STACK_SPILL)
> +                                               break;
> +                               }
> +                               i += BPF_REG_SIZE - j - 1;
> +                               continue;
> +                       }
> +               }
> +
>                 stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
>                 mark_stack_slot_scratched(env, spi);
>

[...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-04 21:10     ` Eduard Zingerman
@ 2024-01-04 23:09       ` Andrii Nakryiko
  2024-01-04 23:29         ` Eduard Zingerman
  0 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2024-01-04 23:09 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Yonghong Song, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau

On Thu, Jan 4, 2024 at 1:11 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-01-04 at 12:12 -0800, Yonghong Song wrote:
> [...]
> > > > @@ -4613,11 +4613,28 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
> > > >
> > > >           /* Variable offset writes destroy any spilled pointers in range. */
> > > >           for (i = min_off; i < max_off; i++) {
> > > > +         struct bpf_reg_state *spill_reg;
> > > >                   u8 new_type, *stype;
> > > > -         int slot, spi;
> > > > +         int slot, spi, j;
> > > >
> > > >                   slot = -i - 1;
> > > >                   spi = slot / BPF_REG_SIZE;
> > > > +
> > > > +         /* If writing_zero and the the spi slot contains a spill of value 0,
> > > > +          * maintain the spill type.
> > > > +          */
> > > > +         if (writing_zero && !(i % BPF_REG_SIZE) && is_spilled_scalar_reg(&state->stack[spi])) {
> > > Talked to Andrii today, and he noted that spilled reg should be marked
> > > precise at this point.
> >
> > Could you help explain why?
> >
> > Looks we did not mark reg as precise with fixed offset as below:
> >
> >          if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && env->bpf_capable) {
> >                  save_register_state(env, state, spi, reg, size);
> >                  /* Break the relation on a narrowing spill. */
> >                  if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
> >                          state->stack[spi].spilled_ptr.id = 0;
> >          } else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) &&
> >                     insn->imm != 0 && env->bpf_capable) {
> >
> > I probably missed something about precision tracking...
>
> The discussed justification was that if verifier assumes something
> about the value of scalar (in this case that it is 0) such scalar
> should be marked precise (e.g. this is done for value_regno in
> check_stack_write_var_off()).
>
> This seemed logical at the time of discussion, however, I can't figure
> a counter example at the moment. It appears that whatever are
> assumptions in check_stack_write_var_off() if spill is used in the
> precise context it would be marked eventually.
> E.g. the following is correctly rejected:
>
> SEC("raw_tp")
> __log_level(2) __flag(BPF_F_TEST_STATE_FREQ)
> __failure
> __naked void var_stack_1(void)
> {
>         asm volatile (
>                 "call %[bpf_get_prandom_u32];"
>                 "r9 = 100500;"
>                 "if r0 > 42 goto +1;"
>                 "r9 = 0;"
>                 "*(u64 *)(r10 - 16) = r9;"
>                 "call %[bpf_get_prandom_u32];"
>                 "r0 &= 0xf;"
>                 "r1 = -1;"
>                 "r1 -= r0;"
>                 "r2 = r10;"
>                 "r2 += r1;"
>                 "r0 = 0;"
>                 "*(u8 *)(r2 + 0) = r0;"
>                 "r1 = %[two_byte_buf];"
>                 "r2 = *(u32 *)(r10 -16);"
>                 "r1 += r2;"
>                 "*(u8 *)(r1 + 0) = r2;" /* this should not be fine */
>                 "exit;"
>         :
>         : __imm_ptr(two_byte_buf),
>           __imm(bpf_get_prandom_u32)
>         : __clobber_common);
> }
>
> So now I'm not sure :(
> Sorry for too much noise.


hm... does that test have to do so many things and do all these u64 vs
u32 vs u8 conversions? Can we try a simple test were we spill u64
SCALAR (imprecise) zero register to fp-8 or fp-16, and then use those
fp-8|fp-16 slot as an index into an array in precise context. Then
have a separate delayed branch that will write non-zero to fp-8|fp-16.
States shouldn't converge and this should be rejected.


Yonghong, the reason fixed offset stack write works is because we know
exactly the stack slot in which spilled register is and we can
backtrack and mark it as precise, if necessary. With variable offset
stack access there is no single stack slot (in general case), so we
lose the link to that spilled register. So we need to either eagerly
mark spilled registers as precise or just do STACK_MISC kind of logic.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-04 23:09       ` Andrii Nakryiko
@ 2024-01-04 23:29         ` Eduard Zingerman
  2024-01-05  1:05           ` Andrii Nakryiko
  0 siblings, 1 reply; 26+ messages in thread
From: Eduard Zingerman @ 2024-01-04 23:29 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Yonghong Song, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau

On Thu, 2024-01-04 at 15:09 -0800, Andrii Nakryiko wrote:
[...]
> > This seemed logical at the time of discussion, however, I can't figure
> > a counter example at the moment. It appears that whatever are
> > assumptions in check_stack_write_var_off() if spill is used in the
> > precise context it would be marked eventually.
> > E.g. the following is correctly rejected:
> > 
> > SEC("raw_tp")
> > __log_level(2) __flag(BPF_F_TEST_STATE_FREQ)
> > __failure
> > __naked void var_stack_1(void)
> > {
> >         asm volatile (
> >                 "call %[bpf_get_prandom_u32];"
> >                 "r9 = 100500;"
> >                 "if r0 > 42 goto +1;"
> >                 "r9 = 0;"
> >                 "*(u64 *)(r10 - 16) = r9;"
> >                 "call %[bpf_get_prandom_u32];"
> >                 "r0 &= 0xf;"
> >                 "r1 = -1;"
> >                 "r1 -= r0;"
> >                 "r2 = r10;"
> >                 "r2 += r1;"
> >                 "r0 = 0;"
> >                 "*(u8 *)(r2 + 0) = r0;"
> >                 "r1 = %[two_byte_buf];"
> >                 "r2 = *(u32 *)(r10 -16);"
> >                 "r1 += r2;"
> >                 "*(u8 *)(r1 + 0) = r2;" /* this should not be fine */
> >                 "exit;"
> >         :
> >         : __imm_ptr(two_byte_buf),
> >           __imm(bpf_get_prandom_u32)
> >         : __clobber_common);
> > }
> > 
> > So now I'm not sure :(
> > Sorry for too much noise.
> 
> 
> hm... does that test have to do so many things and do all these u64 vs
> u32 vs u8 conversions?

The test is actually quite minimal, the longest part is conjuring of
varying offset pointer in r2, here it is with additional comments:

    /* Write 0 or 100500 to fp-16, 0 is on the first verification pass */
    "call %[bpf_get_prandom_u32];"
    "r9 = 100500;"
    "if r0 > 42 goto +1;"
    "r9 = 0;"
    "*(u64 *)(r10 - 16) = r9;"
    /* prepare a variable length access */
    "call %[bpf_get_prandom_u32];"
    "r0 &= 0xf;" /* r0 range is [0; 15] */
    "r1 = -1;"
    "r1 -= r0;"  /* r1 range is [-16; -1] */
    "r2 = r10;"
    "r2 += r1;"  /* r2 range is [fp-16; fp-1] */
    /* do a variable length write of constant 0 */
    "r0 = 0;"
    "*(u8 *)(r2 + 0) = r0;"
    /* use fp-16 to access an array of length 2 */
    "r1 = %[two_byte_buf];"
    "r2 = *(u32 *)(r10 -16);"
    "r1 += r2;"
    "*(u8 *)(r1 + 0) = r2;" /* this should not be fine */
    "exit;"

> Can we try a simple test were we spill u64
> SCALAR (imprecise) zero register to fp-8 or fp-16, and then use those
> fp-8|fp-16 slot as an index into an array in precise context. Then
> have a separate delayed branch that will write non-zero to fp-8|fp-16.
> States shouldn't converge and this should be rejected.

That is what test above does but it also includes varying offset access.

[...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-03 23:26 [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers Yonghong Song
                   ` (3 preceding siblings ...)
  2024-01-04 23:03 ` Andrii Nakryiko
@ 2024-01-05  0:49 ` Andrii Nakryiko
  2024-01-08 23:23 ` Andrii Nakryiko
  5 siblings, 0 replies; 26+ messages in thread
From: Andrii Nakryiko @ 2024-01-05  0:49 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau

On Wed, Jan 3, 2024 at 3:26 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> With patch set [1], precision backtracing supports register spill/fill
> to/from the stack. The patch [2] allows initial imprecise register spill
> with content 0. This is a common case for cpuv3 and lower for
> initializing the stack variables with pattern
>   r1 = 0
>   *(u64 *)(r10 - 8) = r1
> and the [2] has demonstrated good verification improvement.
>
> For cpuv4, the initialization could be
>   *(u64 *)(r10 - 8) = 0
> The current verifier marks the r10-8 contents with STACK_ZERO.
> Similar to [2], let us permit the above insn to behave like
> imprecise register spill which can reduce number of verified states.
> The change is in function check_stack_write_fixed_off().
>
> Before this patch, spilled zero will be marked as STACK_ZERO
> which can provide precise values. In check_stack_write_var_off(),
> STACK_ZERO will be maintained if writing a const zero
> so later it can provide precise values if needed.
>
> The above handling of '*(u64 *)(r10 - 8) = 0' as a spill
> will have issues in check_stack_write_var_off() as the spill
> will be converted to STACK_MISC and the precise value 0
> is lost. To fix this issue, if the spill slots with const
> zero and the BPF_ST write also with const zero, the spill slots
> are preserved, which can later provide precise values
> if needed. Without the change in check_stack_write_var_off(),
> the test_verifier subtest 'BPF_ST_MEM stack imm zero, variable offset'
> will fail.
>
> I checked cpuv3 and cpuv4 with and without this patch with veristat.
> There is no state change for cpuv3 since '*(u64 *)(r10 - 8) = 0'
> is only generated with cpuv4.
>
> For cpuv4:
> $ ../veristat -C old.cpuv4.csv new.cpuv4.csv -e file,prog,insns,states -f 'insns_diff!=0'
> File                                        Program              Insns (A)  Insns (B)  Insns    (DIFF)  States (A)  States (B)  States (DIFF)
> ------------------------------------------  -------------------  ---------  ---------  ---------------  ----------  ----------  -------------
> local_storage_bench.bpf.linked3.o           get_local                  228        168    -60 (-26.32%)          17          14   -3 (-17.65%)
> pyperf600_bpf_loop.bpf.linked3.o            on_event                  6066       4889  -1177 (-19.40%)         403         321  -82 (-20.35%)
> test_cls_redirect.bpf.linked3.o             cls_redirect             35483      35387     -96 (-0.27%)        2179        2177    -2 (-0.09%)
> test_l4lb_noinline.bpf.linked3.o            balancer_ingress          4494       4522     +28 (+0.62%)         217         219    +2 (+0.92%)
> test_l4lb_noinline_dynptr.bpf.linked3.o     balancer_ingress          1432       1455     +23 (+1.61%)          92          94    +2 (+2.17%)
> test_xdp_noinline.bpf.linked3.o             balancer_ingress_v6       3462       3458      -4 (-0.12%)         216         216    +0 (+0.00%)
> verifier_iterating_callbacks.bpf.linked3.o  widening                    52         41    -11 (-21.15%)           4           3   -1 (-25.00%)
> xdp_synproxy_kern.bpf.linked3.o             syncookie_tc             12412      11719    -693 (-5.58%)         345         330   -15 (-4.35%)
> xdp_synproxy_kern.bpf.linked3.o             syncookie_xdp            12478      11794    -684 (-5.48%)         346         331   -15 (-4.34%)
>
> test_l4lb_noinline and test_l4lb_noinline_dynptr has minor regression, but
> pyperf600_bpf_loop and local_storage_bench gets pretty good improvement.
>
>   [1] https://lore.kernel.org/all/20231205184248.1502704-1-andrii@kernel.org/
>   [2] https://lore.kernel.org/all/20231205184248.1502704-9-andrii@kernel.org/
>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> Cc: Martin KaFai Lau <kafai@fb.com>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  kernel/bpf/verifier.c                         | 21 +++++++++++++++++--
>  .../selftests/bpf/progs/verifier_spill_fill.c | 16 +++++++-------
>  2 files changed, 27 insertions(+), 10 deletions(-)
>
> Changelogs:
>   v1 -> v2:
>     - Preserve with-const-zero spill if writing is also zero
>       in check_stack_write_var_off().
>     - Add a test with not-8-byte-aligned BPF_ST store.
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d4e31f61de0e..cfe7a68d90a5 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4491,7 +4491,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>                 if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
>                         state->stack[spi].spilled_ptr.id = 0;
>         } else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) &&
> -                  insn->imm != 0 && env->bpf_capable) {
> +                  env->bpf_capable) {
>                 struct bpf_reg_state fake_reg = {};
>
>                 __mark_reg_known(&fake_reg, insn->imm);
> @@ -4613,11 +4613,28 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
>
>         /* Variable offset writes destroy any spilled pointers in range. */
>         for (i = min_off; i < max_off; i++) {
> +               struct bpf_reg_state *spill_reg;
>                 u8 new_type, *stype;
> -               int slot, spi;
> +               int slot, spi, j;
>
>                 slot = -i - 1;
>                 spi = slot / BPF_REG_SIZE;
> +
> +               /* If writing_zero and the the spi slot contains a spill of value 0,
> +                * maintain the spill type.
> +                */
> +               if (writing_zero && !(i % BPF_REG_SIZE) && is_spilled_scalar_reg(&state->stack[spi])) {
> +                       spill_reg = &state->stack[spi].spilled_ptr;
> +                       if (tnum_is_const(spill_reg->var_off) && spill_reg->var_off.value == 0) {
> +                               for (j = BPF_REG_SIZE; j > 0; j--) {
> +                                       if (state->stack[spi].slot_type[j - 1] != STACK_SPILL)
> +                                               break;
> +                               }
> +                               i += BPF_REG_SIZE - j - 1;
> +                               continue;
> +                       }
> +               }
> +
>                 stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
>                 mark_stack_slot_scratched(env, spi);
>

you are skipping this in some situations, which makes debugging harder
because we won't see state of all affected slots, so please move it up
and set it right after calculating spi

> diff --git a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
> index 39fe3372e0e0..d4b3188afe07 100644
> --- a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
> +++ b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
> @@ -495,14 +495,14 @@ char single_byte_buf[1] SEC(".data.single_byte_buf");
>  SEC("raw_tp")
>  __log_level(2)
>  __success

[...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-04 23:29         ` Eduard Zingerman
@ 2024-01-05  1:05           ` Andrii Nakryiko
  2024-01-05  7:14             ` Yonghong Song
  2024-01-05 23:52             ` Eduard Zingerman
  0 siblings, 2 replies; 26+ messages in thread
From: Andrii Nakryiko @ 2024-01-05  1:05 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Yonghong Song, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau

On Thu, Jan 4, 2024 at 3:29 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-01-04 at 15:09 -0800, Andrii Nakryiko wrote:
> [...]
> > > This seemed logical at the time of discussion, however, I can't figure
> > > a counter example at the moment. It appears that whatever are
> > > assumptions in check_stack_write_var_off() if spill is used in the
> > > precise context it would be marked eventually.
> > > E.g. the following is correctly rejected:
> > >
> > > SEC("raw_tp")
> > > __log_level(2) __flag(BPF_F_TEST_STATE_FREQ)
> > > __failure
> > > __naked void var_stack_1(void)
> > > {
> > >         asm volatile (
> > >                 "call %[bpf_get_prandom_u32];"
> > >                 "r9 = 100500;"
> > >                 "if r0 > 42 goto +1;"
> > >                 "r9 = 0;"
> > >                 "*(u64 *)(r10 - 16) = r9;"
> > >                 "call %[bpf_get_prandom_u32];"
> > >                 "r0 &= 0xf;"
> > >                 "r1 = -1;"
> > >                 "r1 -= r0;"
> > >                 "r2 = r10;"
> > >                 "r2 += r1;"
> > >                 "r0 = 0;"
> > >                 "*(u8 *)(r2 + 0) = r0;"
> > >                 "r1 = %[two_byte_buf];"
> > >                 "r2 = *(u32 *)(r10 -16);"
> > >                 "r1 += r2;"
> > >                 "*(u8 *)(r1 + 0) = r2;" /* this should not be fine */
> > >                 "exit;"
> > >         :
> > >         : __imm_ptr(two_byte_buf),
> > >           __imm(bpf_get_prandom_u32)
> > >         : __clobber_common);
> > > }
> > >
> > > So now I'm not sure :(
> > > Sorry for too much noise.
> >
> >
> > hm... does that test have to do so many things and do all these u64 vs
> > u32 vs u8 conversions?
>
> The test is actually quite minimal, the longest part is conjuring of
> varying offset pointer in r2, here it is with additional comments:
>
>     /* Write 0 or 100500 to fp-16, 0 is on the first verification pass */
>     "call %[bpf_get_prandom_u32];"
>     "r9 = 100500;"
>     "if r0 > 42 goto +1;"
>     "r9 = 0;"
>     "*(u64 *)(r10 - 16) = r9;"
>     /* prepare a variable length access */
>     "call %[bpf_get_prandom_u32];"
>     "r0 &= 0xf;" /* r0 range is [0; 15] */
>     "r1 = -1;"
>     "r1 -= r0;"  /* r1 range is [-16; -1] */
>     "r2 = r10;"
>     "r2 += r1;"  /* r2 range is [fp-16; fp-1] */
>     /* do a variable length write of constant 0 */
>     "r0 = 0;"
>     "*(u8 *)(r2 + 0) = r0;"

I meant this u8

>     /* use fp-16 to access an array of length 2 */
>     "r1 = %[two_byte_buf];"
>     "r2 = *(u32 *)(r10 -16);"

and this u32. I'm not saying it's anything wrong, but it's simpler to
deal with u64 consistently. There is nothing wrong with the test per
se, I'm just saying we should try eliminate unnecessary cross-plays
with narrowing/widening stores/loads.

But that's offtopic, sorry.

>     "r1 += r2;"
>     "*(u8 *)(r1 + 0) = r2;" /* this should not be fine */
>     "exit;"
>
> > Can we try a simple test were we spill u64
> > SCALAR (imprecise) zero register to fp-8 or fp-16, and then use those
> > fp-8|fp-16 slot as an index into an array in precise context. Then
> > have a separate delayed branch that will write non-zero to fp-8|fp-16.
> > States shouldn't converge and this should be rejected.
>
> That is what test above does but it also includes varying offset access.
>

Yes, and the test fails, but if you read the log, you'll see that fp-8
is never marked precise, but it should. So we need more elaborate test
that would somehow exploit fp-8 imprecision.

I ran out of time. But what I tried was replacing


"r2 = *(u32 *)(r10 -16);"

with

"r2 = *(u8 *)(r2 +0);"

So keep both read and write as variable offset. And we are saved by
some missing logic in read_var_off that would set r2 as known zero
(because it should be for the branch where both fp-8 and fp-16 are
zero). But that fails in the branch that should succeed, and if that
branch actually succeeds, I suspect the branch where we initialize
with non-zero r9 will erroneously succeed.

Anyways, I still claim that we are mishandling a precision of spilled
register when doing zero var_off writes.



> [...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-05  1:05           ` Andrii Nakryiko
@ 2024-01-05  7:14             ` Yonghong Song
  2024-01-05  8:10               ` Yonghong Song
  2024-01-05 23:37               ` Eduard Zingerman
  2024-01-05 23:52             ` Eduard Zingerman
  1 sibling, 2 replies; 26+ messages in thread
From: Yonghong Song @ 2024-01-05  7:14 UTC (permalink / raw)
  To: Andrii Nakryiko, Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau


On 1/4/24 5:05 PM, Andrii Nakryiko wrote:
> On Thu, Jan 4, 2024 at 3:29 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>> On Thu, 2024-01-04 at 15:09 -0800, Andrii Nakryiko wrote:
>> [...]
>>>> This seemed logical at the time of discussion, however, I can't figure
>>>> a counter example at the moment. It appears that whatever are
>>>> assumptions in check_stack_write_var_off() if spill is used in the
>>>> precise context it would be marked eventually.
>>>> E.g. the following is correctly rejected:
>>>>
>>>> SEC("raw_tp")
>>>> __log_level(2) __flag(BPF_F_TEST_STATE_FREQ)
>>>> __failure
>>>> __naked void var_stack_1(void)
>>>> {
>>>>          asm volatile (
>>>>                  "call %[bpf_get_prandom_u32];"
>>>>                  "r9 = 100500;"
>>>>                  "if r0 > 42 goto +1;"
>>>>                  "r9 = 0;"
>>>>                  "*(u64 *)(r10 - 16) = r9;"
>>>>                  "call %[bpf_get_prandom_u32];"
>>>>                  "r0 &= 0xf;"
>>>>                  "r1 = -1;"
>>>>                  "r1 -= r0;"
>>>>                  "r2 = r10;"
>>>>                  "r2 += r1;"
>>>>                  "r0 = 0;"
>>>>                  "*(u8 *)(r2 + 0) = r0;"
>>>>                  "r1 = %[two_byte_buf];"
>>>>                  "r2 = *(u32 *)(r10 -16);"
>>>>                  "r1 += r2;"
>>>>                  "*(u8 *)(r1 + 0) = r2;" /* this should not be fine */
>>>>                  "exit;"
>>>>          :
>>>>          : __imm_ptr(two_byte_buf),
>>>>            __imm(bpf_get_prandom_u32)
>>>>          : __clobber_common);
>>>> }
>>>>
>>>> So now I'm not sure :(
>>>> Sorry for too much noise.
>>>
>>> hm... does that test have to do so many things and do all these u64 vs
>>> u32 vs u8 conversions?
>> The test is actually quite minimal, the longest part is conjuring of
>> varying offset pointer in r2, here it is with additional comments:
>>
>>      /* Write 0 or 100500 to fp-16, 0 is on the first verification pass */
>>      "call %[bpf_get_prandom_u32];"
>>      "r9 = 100500;"
>>      "if r0 > 42 goto +1;"
>>      "r9 = 0;"
>>      "*(u64 *)(r10 - 16) = r9;"
>>      /* prepare a variable length access */
>>      "call %[bpf_get_prandom_u32];"
>>      "r0 &= 0xf;" /* r0 range is [0; 15] */
>>      "r1 = -1;"
>>      "r1 -= r0;"  /* r1 range is [-16; -1] */
>>      "r2 = r10;"
>>      "r2 += r1;"  /* r2 range is [fp-16; fp-1] */
>>      /* do a variable length write of constant 0 */
>>      "r0 = 0;"
>>      "*(u8 *)(r2 + 0) = r0;"
> I meant this u8
>
>>      /* use fp-16 to access an array of length 2 */
>>      "r1 = %[two_byte_buf];"
>>      "r2 = *(u32 *)(r10 -16);"
> and this u32. I'm not saying it's anything wrong, but it's simpler to
> deal with u64 consistently. There is nothing wrong with the test per
> se, I'm just saying we should try eliminate unnecessary cross-plays
> with narrowing/widening stores/loads.
>
> But that's offtopic, sorry.
>
>>      "r1 += r2;"
>>      "*(u8 *)(r1 + 0) = r2;" /* this should not be fine */
>>      "exit;"
>>
>>> Can we try a simple test were we spill u64
>>> SCALAR (imprecise) zero register to fp-8 or fp-16, and then use those
>>> fp-8|fp-16 slot as an index into an array in precise context. Then
>>> have a separate delayed branch that will write non-zero to fp-8|fp-16.
>>> States shouldn't converge and this should be rejected.
>> That is what test above does but it also includes varying offset access.
>>
> Yes, and the test fails, but if you read the log, you'll see that fp-8
> is never marked precise, but it should. So we need more elaborate test
> that would somehow exploit fp-8 imprecision.
>
> I ran out of time. But what I tried was replacing
>
>
> "r2 = *(u32 *)(r10 -16);"
>
> with
>
> "r2 = *(u8 *)(r2 +0);"
>
> So keep both read and write as variable offset. And we are saved by
> some missing logic in read_var_off that would set r2 as known zero
> (because it should be for the branch where both fp-8 and fp-16 are
> zero). But that fails in the branch that should succeed, and if that
> branch actually succeeds, I suspect the branch where we initialize
> with non-zero r9 will erroneously succeed.

I did some experiments but still confused.
With the current patch set and the above Andrii's suggested changes, we have
...
13: R1_w=scalar(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0; 0xf)) R2_w=fp(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0; 0xf))
13: (b7) r0 = 0                       ; R0_w=0
14: (73) *(u8 *)(r2 +0) = r0          ; R0_w=0 R2_w=fp(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0; 0xf)) fp-8=mmmmmmmm fp-16=0
15: (bf) r1 = r6                      ; R1_w=map_value(map=.data.two_byte_,ks=4,vs=2) R6=map_value(map=.data.two_byte_,ks=4,vs=2)
16: (71) r2 = *(u8 *)(r2 +0)          ; R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) fp-16=0
17: (0f) r1 += r2
mark_precise: frame0: last_idx 17 first_idx 8 subseq_idx -1
mark_precise: frame0: regs=r2 stack= before 16: (71) r2 = *(u8 *)(r2 +0)
18: R1_w=map_value(map=.data.two_byte_,ks=4,vs=2,smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
18: (73) *(u8 *)(r1 +0) = r2
invalid access to map value, value_size=2 off=255 size=1
R1 max value is outside of the allowed memory range

Now, let us add the precision marking,
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4619,11 +4619,18 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
  
                 slot = -i - 1;
                 spi = slot / BPF_REG_SIZE;
+               mark_stack_slot_scratched(env, spi);
  
                 /* If writing_zero and the the spi slot contains a spill of value 0,
                  * maintain the spill type.
                  */
                 if (writing_zero && !(i % BPF_REG_SIZE) && is_spilled_scalar_reg(&state->stack[spi])) {
+                       if (value_regno >= 0) {
+                               err = mark_chain_precision(env, value_regno);
+                               if (err)
+                                       return err;
+                       }
+
                         spill_reg = &state->stack[spi].spilled_ptr;
                         if (tnum_is_const(spill_reg->var_off) && spill_reg->var_off.value == 0) {
                                 for (j = BPF_REG_SIZE; j > 0; j--) {
@@ -4636,7 +4643,6 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
                 }
  
                 stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
-               mark_stack_slot_scratched(env, spi);
  
                 if (!env->allow_ptr_leaks && *stype != STACK_MISC && *stype != STACK_ZERO) {
                         /* Reject the write if range we may write to has not


With the above change, the verifier output:
...
13: R1_w=scalar(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0; 0xf)) R2_w=fp(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0; 0xf))
13: (b7) r0 = 0                       ; R0_w=0
14: (73) *(u8 *)(r2 +0) = r0
mark_precise: frame0: last_idx 14 first_idx 8 subseq_idx -1
mark_precise: frame0: regs=r0 stack= before 13: (b7) r0 = 0
     <==== added precision marking for the value register
15: R0_w=0 R2_w=fp(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0; 0xf)) fp-8=mmmmmmmm fp-16=0
15: (bf) r1 = r6                      ; R1_w=map_value(map=.data.two_byte_,ks=4,vs=2) R6=map_value(map=.data.two_byte_,ks=4,vs=2)
16: (71) r2 = *(u8 *)(r2 +0)          ; R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) fp-16=0
17: (0f) r1 += r2
mark_precise: frame0: last_idx 17 first_idx 8 subseq_idx -1
mark_precise: frame0: regs=r2 stack= before 16: (71) r2 = *(u8 *)(r2 +0)
18: R1_w=map_value(map=.data.two_byte_,ks=4,vs=2,smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
18: (73) *(u8 *)(r1 +0) = r2
invalid access to map value, value_size=2 off=255 size=1
R1 max value is outside of the allowed memory range

Note that we do have precision marking for register r0 at insn 14.
But backtracking at insn 17 stops at insn 16 and it did not reach back
to insn 14, so precision marking is not really needed in this particular
case. Maybe I missed something here.

There is an alternative implementation in check_stack_write_var_off().
For a spill of value/reg 0, we can convert it to STACK_ZERO instead
of trying to maintain STACK_SPILL. If we convert it to STACK_ZERO,
then we can reuse the rest of logic in check_stack_write_var_off()
and at the end we have

         if (zero_used) {
                 /* backtracking doesn't work for STACK_ZERO yet. */
                 err = mark_chain_precision(env, value_regno);
                 if (err)
                         return err;
         }

although I do not fully understand the above either. Need to go back to
git history to find why.


>
> Anyways, I still claim that we are mishandling a precision of spilled
> register when doing zero var_off writes.
>
>
>
>> [...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-05  7:14             ` Yonghong Song
@ 2024-01-05  8:10               ` Yonghong Song
  2024-01-05 23:37               ` Eduard Zingerman
  1 sibling, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2024-01-05  8:10 UTC (permalink / raw)
  To: Andrii Nakryiko, Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau


On 1/4/24 11:14 PM, Yonghong Song wrote:
>
> On 1/4/24 5:05 PM, Andrii Nakryiko wrote:
>> On Thu, Jan 4, 2024 at 3:29 PM Eduard Zingerman <eddyz87@gmail.com> 
>> wrote:
>>> On Thu, 2024-01-04 at 15:09 -0800, Andrii Nakryiko wrote:
>>> [...]
>>>>> This seemed logical at the time of discussion, however, I can't 
>>>>> figure
>>>>> a counter example at the moment. It appears that whatever are
>>>>> assumptions in check_stack_write_var_off() if spill is used in the
>>>>> precise context it would be marked eventually.
>>>>> E.g. the following is correctly rejected:
>>>>>
>>>>> SEC("raw_tp")
>>>>> __log_level(2) __flag(BPF_F_TEST_STATE_FREQ)
>>>>> __failure
>>>>> __naked void var_stack_1(void)
>>>>> {
>>>>>          asm volatile (
>>>>>                  "call %[bpf_get_prandom_u32];"
>>>>>                  "r9 = 100500;"
>>>>>                  "if r0 > 42 goto +1;"
>>>>>                  "r9 = 0;"
>>>>>                  "*(u64 *)(r10 - 16) = r9;"
>>>>>                  "call %[bpf_get_prandom_u32];"
>>>>>                  "r0 &= 0xf;"
>>>>>                  "r1 = -1;"
>>>>>                  "r1 -= r0;"
>>>>>                  "r2 = r10;"
>>>>>                  "r2 += r1;"
>>>>>                  "r0 = 0;"
>>>>>                  "*(u8 *)(r2 + 0) = r0;"
>>>>>                  "r1 = %[two_byte_buf];"
>>>>>                  "r2 = *(u32 *)(r10 -16);"
>>>>>                  "r1 += r2;"
>>>>>                  "*(u8 *)(r1 + 0) = r2;" /* this should not be 
>>>>> fine */
>>>>>                  "exit;"
>>>>>          :
>>>>>          : __imm_ptr(two_byte_buf),
>>>>>            __imm(bpf_get_prandom_u32)
>>>>>          : __clobber_common);
>>>>> }
>>>>>
>>>>> So now I'm not sure :(
>>>>> Sorry for too much noise.
>>>>
>>>> hm... does that test have to do so many things and do all these u64 vs
>>>> u32 vs u8 conversions?
>>> The test is actually quite minimal, the longest part is conjuring of
>>> varying offset pointer in r2, here it is with additional comments:
>>>
>>>      /* Write 0 or 100500 to fp-16, 0 is on the first verification 
>>> pass */
>>>      "call %[bpf_get_prandom_u32];"
>>>      "r9 = 100500;"
>>>      "if r0 > 42 goto +1;"
>>>      "r9 = 0;"
>>>      "*(u64 *)(r10 - 16) = r9;"
>>>      /* prepare a variable length access */
>>>      "call %[bpf_get_prandom_u32];"
>>>      "r0 &= 0xf;" /* r0 range is [0; 15] */
>>>      "r1 = -1;"
>>>      "r1 -= r0;"  /* r1 range is [-16; -1] */
>>>      "r2 = r10;"
>>>      "r2 += r1;"  /* r2 range is [fp-16; fp-1] */
>>>      /* do a variable length write of constant 0 */
>>>      "r0 = 0;"
>>>      "*(u8 *)(r2 + 0) = r0;"
>> I meant this u8
>>
>>>      /* use fp-16 to access an array of length 2 */
>>>      "r1 = %[two_byte_buf];"
>>>      "r2 = *(u32 *)(r10 -16);"
>> and this u32. I'm not saying it's anything wrong, but it's simpler to
>> deal with u64 consistently. There is nothing wrong with the test per
>> se, I'm just saying we should try eliminate unnecessary cross-plays
>> with narrowing/widening stores/loads.
>>
>> But that's offtopic, sorry.
>>
>>>      "r1 += r2;"
>>>      "*(u8 *)(r1 + 0) = r2;" /* this should not be fine */
>>>      "exit;"
>>>
>>>> Can we try a simple test were we spill u64
>>>> SCALAR (imprecise) zero register to fp-8 or fp-16, and then use those
>>>> fp-8|fp-16 slot as an index into an array in precise context. Then
>>>> have a separate delayed branch that will write non-zero to fp-8|fp-16.
>>>> States shouldn't converge and this should be rejected.
>>> That is what test above does but it also includes varying offset 
>>> access.
>>>
>> Yes, and the test fails, but if you read the log, you'll see that fp-8
>> is never marked precise, but it should. So we need more elaborate test
>> that would somehow exploit fp-8 imprecision.
>>
>> I ran out of time. But what I tried was replacing
>>
>>
>> "r2 = *(u32 *)(r10 -16);"
>>
>> with
>>
>> "r2 = *(u8 *)(r2 +0);"
>>
>> So keep both read and write as variable offset. And we are saved by
>> some missing logic in read_var_off that would set r2 as known zero
>> (because it should be for the branch where both fp-8 and fp-16 are
>> zero). But that fails in the branch that should succeed, and if that
>> branch actually succeeds, I suspect the branch where we initialize
>> with non-zero r9 will erroneously succeed.
>
> I did some experiments but still confused.
> With the current patch set and the above Andrii's suggested changes, 
> we have
> ...
> 13: 
> R1_w=scalar(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0; 
> 0xf)) 
> R2_w=fp(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0; 
> 0xf))
> 13: (b7) r0 = 0                       ; R0_w=0
> 14: (73) *(u8 *)(r2 +0) = r0          ; R0_w=0 
> R2_w=fp(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0; 
> 0xf)) fp-8=mmmmmmmm fp-16=0
> 15: (bf) r1 = r6                      ; 
> R1_w=map_value(map=.data.two_byte_,ks=4,vs=2) 
> R6=map_value(map=.data.two_byte_,ks=4,vs=2)
> 16: (71) r2 = *(u8 *)(r2 +0)          ; 
> R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 
> 0xff)) fp-16=0
> 17: (0f) r1 += r2
> mark_precise: frame0: last_idx 17 first_idx 8 subseq_idx -1
> mark_precise: frame0: regs=r2 stack= before 16: (71) r2 = *(u8 *)(r2 +0)
> 18: 
> R1_w=map_value(map=.data.two_byte_,ks=4,vs=2,smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 
> 0xff)) 
> R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 
> 0xff))
> 18: (73) *(u8 *)(r1 +0) = r2
> invalid access to map value, value_size=2 off=255 size=1
> R1 max value is outside of the allowed memory range
>
> Now, let us add the precision marking,
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4619,11 +4619,18 @@ static int check_stack_write_var_off(struct 
> bpf_verifier_env *env,
>
>                 slot = -i - 1;
>                 spi = slot / BPF_REG_SIZE;
> +               mark_stack_slot_scratched(env, spi);
>
>                 /* If writing_zero and the the spi slot contains a 
> spill of value 0,
>                  * maintain the spill type.
>                  */
>                 if (writing_zero && !(i % BPF_REG_SIZE) && 
> is_spilled_scalar_reg(&state->stack[spi])) {
> +                       if (value_regno >= 0) {
> +                               err = mark_chain_precision(env, 
> value_regno);
> +                               if (err)
> +                                       return err;
> +                       }
> +
>                         spill_reg = &state->stack[spi].spilled_ptr;
>                         if (tnum_is_const(spill_reg->var_off) && 
> spill_reg->var_off.value == 0) {
>                                 for (j = BPF_REG_SIZE; j > 0; j--) {
> @@ -4636,7 +4643,6 @@ static int check_stack_write_var_off(struct 
> bpf_verifier_env *env,
>                 }
>
>                 stype = &state->stack[spi].slot_type[slot % 
> BPF_REG_SIZE];
> -               mark_stack_slot_scratched(env, spi);
>
>                 if (!env->allow_ptr_leaks && *stype != STACK_MISC && 
> *stype != STACK_ZERO) {
>                         /* Reject the write if range we may write to 
> has not
>
>
> With the above change, the verifier output:
> ...
> 13: 
> R1_w=scalar(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0; 
> 0xf)) 
> R2_w=fp(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0; 
> 0xf))
> 13: (b7) r0 = 0                       ; R0_w=0
> 14: (73) *(u8 *)(r2 +0) = r0
> mark_precise: frame0: last_idx 14 first_idx 8 subseq_idx -1
> mark_precise: frame0: regs=r0 stack= before 13: (b7) r0 = 0
>     <==== added precision marking for the value register
> 15: R0_w=0 
> R2_w=fp(smin=smin32=-16,smax=smax32=-1,umin=0xfffffffffffffff0,umin32=0xfffffff0,var_off=(0xfffffffffffffff0; 
> 0xf)) fp-8=mmmmmmmm fp-16=0
> 15: (bf) r1 = r6                      ; 
> R1_w=map_value(map=.data.two_byte_,ks=4,vs=2) 
> R6=map_value(map=.data.two_byte_,ks=4,vs=2)
> 16: (71) r2 = *(u8 *)(r2 +0)          ; 
> R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 
> 0xff)) fp-16=0
> 17: (0f) r1 += r2
> mark_precise: frame0: last_idx 17 first_idx 8 subseq_idx -1
> mark_precise: frame0: regs=r2 stack= before 16: (71) r2 = *(u8 *)(r2 +0)
> 18: 
> R1_w=map_value(map=.data.two_byte_,ks=4,vs=2,smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 
> 0xff)) 
> R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 
> 0xff))
> 18: (73) *(u8 *)(r1 +0) = r2
> invalid access to map value, value_size=2 off=255 size=1
> R1 max value is outside of the allowed memory range
>
> Note that we do have precision marking for register r0 at insn 14.
> But backtracking at insn 17 stops at insn 16 and it did not reach back
> to insn 14, so precision marking is not really needed in this particular
> case. Maybe I missed something here.
>
> There is an alternative implementation in check_stack_write_var_off().
> For a spill of value/reg 0, we can convert it to STACK_ZERO instead
> of trying to maintain STACK_SPILL. If we convert it to STACK_ZERO,
> then we can reuse the rest of logic in check_stack_write_var_off()
> and at the end we have
>
>         if (zero_used) {
>                 /* backtracking doesn't work for STACK_ZERO yet. */
>                 err = mark_chain_precision(env, value_regno);
>                 if (err)
>                         return err;
>         }
>
> although I do not fully understand the above either. Need to go back to
> git history to find why.

Did some investigation with selftests. test_progs/test_progs-cpuv4 run
did not hit the above mark_chain_precision. Only one test in test_verifier
hits the above code.

{
         "BPF_ST_MEM stack imm zero, variable offset",
         .insns = {
         /* set fp[-16], fp[-24] to zeros */
         BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 0),
         BPF_ST_MEM(BPF_DW, BPF_REG_10, -24, 0),
         /* r0 = random value in range [-32, -15] */
         BPF_EMIT_CALL(BPF_FUNC_get_prandom_u32),
         BPF_JMP_IMM(BPF_JLE, BPF_REG_0, 16, 2),
         BPF_MOV64_IMM(BPF_REG_0, 0),
         BPF_EXIT_INSN(),
         BPF_ALU64_IMM(BPF_SUB, BPF_REG_0, 32),
         /* fp[r0] = 0, make a variable offset write of zero,
          *             this should preserve zero marks on stack.
          */
         BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_10),
         BPF_ST_MEM(BPF_B, BPF_REG_0, 0, 0),
         /* r0 = fp[-20], if variable offset write was tracked correctly
          *               r0 would be a known zero.
          */
         BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_10, -20),
         /* Would fail return code verification if r0 range is not tracked correctly. */
         BPF_EXIT_INSN(),
         },
         .result = ACCEPT,
         /* Use prog type that requires return value in range [0, 1] */
         .prog_type = BPF_PROG_TYPE_SK_LOOKUP,
         .expected_attach_type = BPF_SK_LOOKUP,
         .runs = -1,
},

The verifier output for the insn with variable range and with writing 0:
8: (72) *(u8 *)(r0 +0) = 0            ; R0_w=fp(smin=smin32=-32,smax=smax32=-16,umin=0xffffffffffffffe0,umax=0xfffffffffffffff0,umin32=0xffffffe0,umax32=0xfffffff0,var_off=(0xffffffffffffffe0; 0x1f)) fp-16_w=00000000 fp-24_w=00000000 fp-32=mmmmmmmm
9: (71) r0 = *(u8 *)(r10 -20)         ; R0_w=0 R10=fp0 fp-24_w=00000000

In the above, value_regno is -1, so backtracking is actually a noop.
   

>
>
>>
>> Anyways, I still claim that we are mishandling a precision of spilled
>> register when doing zero var_off writes.
>>
>>
>>
>>> [...]
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-05  7:14             ` Yonghong Song
  2024-01-05  8:10               ` Yonghong Song
@ 2024-01-05 23:37               ` Eduard Zingerman
  2024-01-08 18:59                 ` Yonghong Song
  1 sibling, 1 reply; 26+ messages in thread
From: Eduard Zingerman @ 2024-01-05 23:37 UTC (permalink / raw)
  To: Yonghong Song, Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau

On Thu, 2024-01-04 at 23:14 -0800, Yonghong Song wrote:
[...]
> There is an alternative implementation in check_stack_write_var_off().
> For a spill of value/reg 0, we can convert it to STACK_ZERO instead
> of trying to maintain STACK_SPILL. If we convert it to STACK_ZERO,
> then we can reuse the rest of logic in check_stack_write_var_off()
> and at the end we have
> 
>          if (zero_used) {
>                  /* backtracking doesn't work for STACK_ZERO yet. */
>                  err = mark_chain_precision(env, value_regno);
>                  if (err)
>                          return err;
>          }
> 
> although I do not fully understand the above either. Need to go back to
> git history to find why.

Regarding this particular code (unrelated to this the patch-set).

Both check_stack_read_fixed_off() and check_stack_read_var_off()
set destination register to zero if all bytes at varying offset are STACK_ZERO.
Backtracking can handle fixed offset writes, but does not know how to
handle varying offset writes. E.g. if:
- there is some code 'arr[i] = r0';
- and r0 happens to be zero for some state;
- later arr[i] is used in precise context;
Verifier would have no means to propagate precision mark to r0.
Hence apply precision mark conservatively.

Does that makes sense?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-05  1:05           ` Andrii Nakryiko
  2024-01-05  7:14             ` Yonghong Song
@ 2024-01-05 23:52             ` Eduard Zingerman
  2024-01-08 19:51               ` Yonghong Song
  2024-01-08 23:18               ` Andrii Nakryiko
  1 sibling, 2 replies; 26+ messages in thread
From: Eduard Zingerman @ 2024-01-05 23:52 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Yonghong Song, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau

On Thu, 2024-01-04 at 17:05 -0800, Andrii Nakryiko wrote:
[...]
> > The test is actually quite minimal, the longest part is conjuring of
> > varying offset pointer in r2, here it is with additional comments:
> > 
> >     /* Write 0 or 100500 to fp-16, 0 is on the first verification pass */
> >     "call %[bpf_get_prandom_u32];"
> >     "r9 = 100500;"
> >     "if r0 > 42 goto +1;"
> >     "r9 = 0;"
> >     "*(u64 *)(r10 - 16) = r9;"
> >     /* prepare a variable length access */
> >     "call %[bpf_get_prandom_u32];"
> >     "r0 &= 0xf;" /* r0 range is [0; 15] */
> >     "r1 = -1;"
> >     "r1 -= r0;"  /* r1 range is [-16; -1] */
> >     "r2 = r10;"
> >     "r2 += r1;"  /* r2 range is [fp-16; fp-1] */
> >     /* do a variable length write of constant 0 */
> >     "r0 = 0;"
> >     "*(u8 *)(r2 + 0) = r0;"
[...]
> Yes, and the test fails, but if you read the log, you'll see that fp-8
> is never marked precise, but it should. So we need more elaborate test
> that would somehow exploit fp-8 imprecision.

Sorry, I don't understand why fp-8 should be precise for this particular test.
Only value read from fp-16 is used in precise context.

[...]
> So keep both read and write as variable offset. And we are saved by
> some missing logic in read_var_off that would set r2 as known zero
> (because it should be for the branch where both fp-8 and fp-16 are
> zero). But that fails in the branch that should succeed, and if that
> branch actually succeeds, I suspect the branch where we initialize
> with non-zero r9 will erroneously succeed.
> 
> Anyways, I still claim that we are mishandling a precision of spilled
> register when doing zero var_off writes.

Currently check_stack_read_var_off() has two possible outcomes:
- if all bytes at varying offset are STACK_ZERO destination register
  is set to zero;
- otherwise destination register is set to unbound scalar.

Unless I missed something, STACK_ZERO is assigned to .slot_type only
in check_stack_write_fixed_off(), and there the source register is
marked as precise immediately.

So, it appears to me that current state of patch #1 is ok.

In case if check_stack_read_var_off() would be modified to check not
only for STACK_ZERO, but also for zero spills, I think that all such
spills would have to be marked precise at the time of read,
as backtracking would not be able to find those later.
But that is not related to change in check_stack_write_var_off()
introduced by this patch.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-05 23:37               ` Eduard Zingerman
@ 2024-01-08 18:59                 ` Yonghong Song
  2024-01-08 19:06                   ` Eduard Zingerman
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2024-01-08 18:59 UTC (permalink / raw)
  To: Eduard Zingerman, Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau


On 1/5/24 3:37 PM, Eduard Zingerman wrote:
> On Thu, 2024-01-04 at 23:14 -0800, Yonghong Song wrote:
> [...]
>> There is an alternative implementation in check_stack_write_var_off().
>> For a spill of value/reg 0, we can convert it to STACK_ZERO instead
>> of trying to maintain STACK_SPILL. If we convert it to STACK_ZERO,
>> then we can reuse the rest of logic in check_stack_write_var_off()
>> and at the end we have
>>
>>           if (zero_used) {
>>                   /* backtracking doesn't work for STACK_ZERO yet. */
>>                   err = mark_chain_precision(env, value_regno);
>>                   if (err)
>>                           return err;
>>           }
>>
>> although I do not fully understand the above either. Need to go back to
>> git history to find why.
> Regarding this particular code (unrelated to this the patch-set).
>
> Both check_stack_read_fixed_off() and check_stack_read_var_off()
> set destination register to zero if all bytes at varying offset are STACK_ZERO.
> Backtracking can handle fixed offset writes, but does not know how to
> handle varying offset writes. E.g. if:
> - there is some code 'arr[i] = r0';
> - and r0 happens to be zero for some state;
> - later arr[i] is used in precise context;
> Verifier would have no means to propagate precision mark to r0.
> Hence apply precision mark conservatively.
>
> Does that makes sense?

Thanks for explanation. I guess I understand now, it does make sense.
let us say arr array's element type is long (8 byte) and the range of i could be
from -32 to -1. I guess one way could be doing backtracking with "... = arr[i]"
is to have four ranges, [-32, -24), [-24, -16), [-16, -8), [-8, 0).
Later, when we see arr[i] = r0 and i has range [-32, 0). Since it covers [-32, -24), etc.,
precision marking can proceed with 'r0'. But I guess this can potentially
increase verifier backtracking states a lot and is not scalable. Conservatively
doing precision marking with 'r0' (in arr[i] = r0) is a better idea.

Andrii has similar comments in
   https://lore.kernel.org/bpf/CAEf4Bzb0LdSPnFZ-kPRftofA6LsaOkxXLN4_fr9BLR3iG-te-g@mail.gmail.com/


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-08 18:59                 ` Yonghong Song
@ 2024-01-08 19:06                   ` Eduard Zingerman
  2024-01-08 19:40                     ` Yonghong Song
  0 siblings, 1 reply; 26+ messages in thread
From: Eduard Zingerman @ 2024-01-08 19:06 UTC (permalink / raw)
  To: Yonghong Song, Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau

On Mon, 2024-01-08 at 10:59 -0800, Yonghong Song wrote:
[...]

> I guess one way could be doing backtracking with "... = arr[i]"
> is to have four ranges, [-32, -24), [-24, -16), [-16, -8), [-8, 0).
> Later, when we see arr[i] = r0 and i has range [-32, 0). Since it covers [-32, -24), etc.,
> precision marking can proceed with 'r0'. But I guess this can potentially
> increase verifier backtracking states a lot and is not scalable. Conservatively
> doing precision marking with 'r0' (in arr[i] = r0) is a better idea.

In theory it should be possible to collapse this range to min/max pair.
But it is a complication, and I'd say it shouldn't be implemented
unless we have evidence that it significantly improves verification
performance.

> 
> Andrii has similar comments in
>    https://lore.kernel.org/bpf/CAEf4Bzb0LdSPnFZ-kPRftofA6LsaOkxXLN4_fr9BLR3iG-te-g@mail.gmail.com/
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-08 19:06                   ` Eduard Zingerman
@ 2024-01-08 19:40                     ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2024-01-08 19:40 UTC (permalink / raw)
  To: Eduard Zingerman, Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau


On 1/8/24 11:06 AM, Eduard Zingerman wrote:
> On Mon, 2024-01-08 at 10:59 -0800, Yonghong Song wrote:
> [...]
>
>> I guess one way could be doing backtracking with "... = arr[i]"
>> is to have four ranges, [-32, -24), [-24, -16), [-16, -8), [-8, 0).
>> Later, when we see arr[i] = r0 and i has range [-32, 0). Since it covers [-32, -24), etc.,
>> precision marking can proceed with 'r0'. But I guess this can potentially
>> increase verifier backtracking states a lot and is not scalable. Conservatively
>> doing precision marking with 'r0' (in arr[i] = r0) is a better idea.
> In theory it should be possible to collapse this range to min/max pair.
> But it is a complication, and I'd say it shouldn't be implemented
> unless we have evidence that it significantly improves verification
> performance.

Ack. We do not need to introduce this yet as the variable index range
should be much much less common.

>
>> Andrii has similar comments in
>>     https://lore.kernel.org/bpf/CAEf4Bzb0LdSPnFZ-kPRftofA6LsaOkxXLN4_fr9BLR3iG-te-g@mail.gmail.com/
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-05 23:52             ` Eduard Zingerman
@ 2024-01-08 19:51               ` Yonghong Song
  2024-01-08 20:05                 ` Eduard Zingerman
  2024-01-08 23:18               ` Andrii Nakryiko
  1 sibling, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2024-01-08 19:51 UTC (permalink / raw)
  To: Eduard Zingerman, Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau


On 1/5/24 3:52 PM, Eduard Zingerman wrote:
> On Thu, 2024-01-04 at 17:05 -0800, Andrii Nakryiko wrote:
> [...]
>>> The test is actually quite minimal, the longest part is conjuring of
>>> varying offset pointer in r2, here it is with additional comments:
>>>
>>>      /* Write 0 or 100500 to fp-16, 0 is on the first verification pass */
>>>      "call %[bpf_get_prandom_u32];"
>>>      "r9 = 100500;"
>>>      "if r0 > 42 goto +1;"
>>>      "r9 = 0;"
>>>      "*(u64 *)(r10 - 16) = r9;"
>>>      /* prepare a variable length access */
>>>      "call %[bpf_get_prandom_u32];"
>>>      "r0 &= 0xf;" /* r0 range is [0; 15] */
>>>      "r1 = -1;"
>>>      "r1 -= r0;"  /* r1 range is [-16; -1] */
>>>      "r2 = r10;"
>>>      "r2 += r1;"  /* r2 range is [fp-16; fp-1] */
>>>      /* do a variable length write of constant 0 */
>>>      "r0 = 0;"
>>>      "*(u8 *)(r2 + 0) = r0;"
> [...]
>> Yes, and the test fails, but if you read the log, you'll see that fp-8
>> is never marked precise, but it should. So we need more elaborate test
>> that would somehow exploit fp-8 imprecision.
> Sorry, I don't understand why fp-8 should be precise for this particular test.
> Only value read from fp-16 is used in precise context.
>
> [...]
>> So keep both read and write as variable offset. And we are saved by
>> some missing logic in read_var_off that would set r2 as known zero
>> (because it should be for the branch where both fp-8 and fp-16 are
>> zero). But that fails in the branch that should succeed, and if that
>> branch actually succeeds, I suspect the branch where we initialize
>> with non-zero r9 will erroneously succeed.
>>
>> Anyways, I still claim that we are mishandling a precision of spilled
>> register when doing zero var_off writes.
> Currently check_stack_read_var_off() has two possible outcomes:
> - if all bytes at varying offset are STACK_ZERO destination register
>    is set to zero;
> - otherwise destination register is set to unbound scalar.
>
> Unless I missed something, STACK_ZERO is assigned to .slot_type only
> in check_stack_write_fixed_off(), and there the source register is
> marked as precise immediately.
>
> So, it appears to me that current state of patch #1 is ok.
>
> In case if check_stack_read_var_off() would be modified to check not
> only for STACK_ZERO, but also for zero spills, I think that all such
> spills would have to be marked precise at the time of read,
> as backtracking would not be able to find those later.

I don't understand the above. If the code pattern looks like
   r1 = ...; /* r1 range [-32, -16);
   *(u8 *)(r10 + r1) = r2;
   ...
   r3 = *(u8 *)(r10 + r1);
   r3 needs to be marked as precise.

Conservatively marking r2 in '*(u8 *)(r10 + r1) = r2' as precise
should be the correct way to do.

Or you are thinking even more complex code pattern like
   *(u64 *)(r10 - 32) = r4;
   *(u64 *)(r10 - 24) = r5;
   ...
   r1 = ...; /* r1 range [-32, -16) */
   r3 = *(u8 *)(r10 + r1);
   r3 needs to be marked as precise.

In this case, we should proactively mark r4 and r5 as precise.
But currently we did not do it, right?
I think this later case is a very unlikely case.

> But that is not related to change in check_stack_write_var_off()
> introduced by this patch.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-08 19:51               ` Yonghong Song
@ 2024-01-08 20:05                 ` Eduard Zingerman
  2024-01-08 21:51                   ` Yonghong Song
  0 siblings, 1 reply; 26+ messages in thread
From: Eduard Zingerman @ 2024-01-08 20:05 UTC (permalink / raw)
  To: Yonghong Song, Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau

On Mon, 2024-01-08 at 11:51 -0800, Yonghong Song wrote:
[...]
> > In case if check_stack_read_var_off() would be modified to check not
> > only for STACK_ZERO, but also for zero spills, I think that all such
> > spills would have to be marked precise at the time of read,
> > as backtracking would not be able to find those later.
> 
> I don't understand the above. If the code pattern looks like
>    r1 = ...; /* r1 range [-32, -16);
>    *(u8 *)(r10 + r1) = r2;
>    ...
>    r3 = *(u8 *)(r10 + r1);
>    r3 needs to be marked as precise.
> 
> Conservatively marking r2 in '*(u8 *)(r10 + r1) = r2' as precise
> should be the correct way to do.
> 
> Or you are thinking even more complex code pattern like
>    *(u64 *)(r10 - 32) = r4;
>    *(u64 *)(r10 - 24) = r5;
>    ...
>    r1 = ...; /* r1 range [-32, -16) */
>    r3 = *(u8 *)(r10 + r1);
>    r3 needs to be marked as precise.
> 
> In this case, we should proactively mark r4 and r5 as precise.
> But currently we did not do it, right?

Yes, I'm thinking about the latter scenario.
There would be zero spills for fp-32 and fp-24.
If check_stack_read_var_off() is modified to handle zero spills,
then it would conclude that r3 is zero.
But if r3 is later marked precise, there would be no info for
backtracking to mark fp-32, fp-24, r4, r5 as precise:
- either backtracking info would have to be supplemented with a list
  of stack locations that were spilled zeros at time of
  check_stack_read_var_off();
- or check_stack_read_var_off() would need to conservatively mark
  all spilled zeros as precise.

Nothing like that is needed now, because check_stack_read_var_off()
would return unbound scalar for r3 upon seeing zero spill.

> I think this later case is a very unlikely case.

But it is possible and verifier has to be conservative.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-08 20:05                 ` Eduard Zingerman
@ 2024-01-08 21:51                   ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2024-01-08 21:51 UTC (permalink / raw)
  To: Eduard Zingerman, Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau


On 1/8/24 12:05 PM, Eduard Zingerman wrote:
> On Mon, 2024-01-08 at 11:51 -0800, Yonghong Song wrote:
> [...]
>>> In case if check_stack_read_var_off() would be modified to check not
>>> only for STACK_ZERO, but also for zero spills, I think that all such
>>> spills would have to be marked precise at the time of read,
>>> as backtracking would not be able to find those later.
>> I don't understand the above. If the code pattern looks like
>>     r1 = ...; /* r1 range [-32, -16);
>>     *(u8 *)(r10 + r1) = r2;
>>     ...
>>     r3 = *(u8 *)(r10 + r1);
>>     r3 needs to be marked as precise.
>>
>> Conservatively marking r2 in '*(u8 *)(r10 + r1) = r2' as precise
>> should be the correct way to do.
>>
>> Or you are thinking even more complex code pattern like
>>     *(u64 *)(r10 - 32) = r4;
>>     *(u64 *)(r10 - 24) = r5;
>>     ...
>>     r1 = ...; /* r1 range [-32, -16) */
>>     r3 = *(u8 *)(r10 + r1);
>>     r3 needs to be marked as precise.
>>
>> In this case, we should proactively mark r4 and r5 as precise.
>> But currently we did not do it, right?
> Yes, I'm thinking about the latter scenario.
> There would be zero spills for fp-32 and fp-24.
> If check_stack_read_var_off() is modified to handle zero spills,
> then it would conclude that r3 is zero.
> But if r3 is later marked precise, there would be no info for
> backtracking to mark fp-32, fp-24, r4, r5 as precise:
> - either backtracking info would have to be supplemented with a list
>    of stack locations that were spilled zeros at time of
>    check_stack_read_var_off();
> - or check_stack_read_var_off() would need to conservatively mark
>    all spilled zeros as precise.
>
> Nothing like that is needed now, because check_stack_read_var_off()
> would return unbound scalar for r3 upon seeing zero spill.
>
>> I think this later case is a very unlikely case.
> But it is possible and verifier has to be conservative.

Indeed. Such variable stack read has been handled properly.

Now I think I understand better on how verifier works for
load/store with var offsets. Basically some small changes
in check_stack_write_var_off(). Will post v3 soon.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-05 23:52             ` Eduard Zingerman
  2024-01-08 19:51               ` Yonghong Song
@ 2024-01-08 23:18               ` Andrii Nakryiko
  1 sibling, 0 replies; 26+ messages in thread
From: Andrii Nakryiko @ 2024-01-08 23:18 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Yonghong Song, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau

On Fri, Jan 5, 2024 at 3:52 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-01-04 at 17:05 -0800, Andrii Nakryiko wrote:
> [...]
> > > The test is actually quite minimal, the longest part is conjuring of
> > > varying offset pointer in r2, here it is with additional comments:
> > >
> > >     /* Write 0 or 100500 to fp-16, 0 is on the first verification pass */
> > >     "call %[bpf_get_prandom_u32];"
> > >     "r9 = 100500;"
> > >     "if r0 > 42 goto +1;"
> > >     "r9 = 0;"
> > >     "*(u64 *)(r10 - 16) = r9;"
> > >     /* prepare a variable length access */
> > >     "call %[bpf_get_prandom_u32];"
> > >     "r0 &= 0xf;" /* r0 range is [0; 15] */
> > >     "r1 = -1;"
> > >     "r1 -= r0;"  /* r1 range is [-16; -1] */
> > >     "r2 = r10;"
> > >     "r2 += r1;"  /* r2 range is [fp-16; fp-1] */
> > >     /* do a variable length write of constant 0 */
> > >     "r0 = 0;"
> > >     "*(u8 *)(r2 + 0) = r0;"
> [...]
> > Yes, and the test fails, but if you read the log, you'll see that fp-8
> > is never marked precise, but it should. So we need more elaborate test
> > that would somehow exploit fp-8 imprecision.
>
> Sorry, I don't understand why fp-8 should be precise for this particular test.
> Only value read from fp-16 is used in precise context.
>
> [...]
> > So keep both read and write as variable offset. And we are saved by
> > some missing logic in read_var_off that would set r2 as known zero
> > (because it should be for the branch where both fp-8 and fp-16 are
> > zero). But that fails in the branch that should succeed, and if that
> > branch actually succeeds, I suspect the branch where we initialize
> > with non-zero r9 will erroneously succeed.
> >
> > Anyways, I still claim that we are mishandling a precision of spilled
> > register when doing zero var_off writes.
>
> Currently check_stack_read_var_off() has two possible outcomes:
> - if all bytes at varying offset are STACK_ZERO destination register
>   is set to zero;
> - otherwise destination register is set to unbound scalar.
>
> Unless I missed something, STACK_ZERO is assigned to .slot_type only
> in check_stack_write_fixed_off(), and there the source register is
> marked as precise immediately.
>
> So, it appears to me that current state of patch #1 is ok.

I agree. Thinking some more I agree with you, what I was concerned
about should be handled properly at read time. So I think what we have
in this patch is ok. Sorry for the noise :)

>
> In case if check_stack_read_var_off() would be modified to check not
> only for STACK_ZERO, but also for zero spills, I think that all such
> spills would have to be marked precise at the time of read,
> as backtracking would not be able to find those later.
> But that is not related to change in check_stack_write_var_off()
> introduced by this patch.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-03 23:26 [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers Yonghong Song
                   ` (4 preceding siblings ...)
  2024-01-05  0:49 ` Andrii Nakryiko
@ 2024-01-08 23:23 ` Andrii Nakryiko
  2024-01-08 23:39   ` Yonghong Song
  5 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2024-01-08 23:23 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau

On Wed, Jan 3, 2024 at 3:26 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> With patch set [1], precision backtracing supports register spill/fill
> to/from the stack. The patch [2] allows initial imprecise register spill
> with content 0. This is a common case for cpuv3 and lower for
> initializing the stack variables with pattern
>   r1 = 0
>   *(u64 *)(r10 - 8) = r1
> and the [2] has demonstrated good verification improvement.
>
> For cpuv4, the initialization could be
>   *(u64 *)(r10 - 8) = 0
> The current verifier marks the r10-8 contents with STACK_ZERO.
> Similar to [2], let us permit the above insn to behave like
> imprecise register spill which can reduce number of verified states.
> The change is in function check_stack_write_fixed_off().
>
> Before this patch, spilled zero will be marked as STACK_ZERO
> which can provide precise values. In check_stack_write_var_off(),
> STACK_ZERO will be maintained if writing a const zero
> so later it can provide precise values if needed.
>
> The above handling of '*(u64 *)(r10 - 8) = 0' as a spill
> will have issues in check_stack_write_var_off() as the spill
> will be converted to STACK_MISC and the precise value 0
> is lost. To fix this issue, if the spill slots with const
> zero and the BPF_ST write also with const zero, the spill slots
> are preserved, which can later provide precise values
> if needed. Without the change in check_stack_write_var_off(),
> the test_verifier subtest 'BPF_ST_MEM stack imm zero, variable offset'
> will fail.
>
> I checked cpuv3 and cpuv4 with and without this patch with veristat.
> There is no state change for cpuv3 since '*(u64 *)(r10 - 8) = 0'
> is only generated with cpuv4.
>
> For cpuv4:
> $ ../veristat -C old.cpuv4.csv new.cpuv4.csv -e file,prog,insns,states -f 'insns_diff!=0'
> File                                        Program              Insns (A)  Insns (B)  Insns    (DIFF)  States (A)  States (B)  States (DIFF)
> ------------------------------------------  -------------------  ---------  ---------  ---------------  ----------  ----------  -------------
> local_storage_bench.bpf.linked3.o           get_local                  228        168    -60 (-26.32%)          17          14   -3 (-17.65%)
> pyperf600_bpf_loop.bpf.linked3.o            on_event                  6066       4889  -1177 (-19.40%)         403         321  -82 (-20.35%)
> test_cls_redirect.bpf.linked3.o             cls_redirect             35483      35387     -96 (-0.27%)        2179        2177    -2 (-0.09%)
> test_l4lb_noinline.bpf.linked3.o            balancer_ingress          4494       4522     +28 (+0.62%)         217         219    +2 (+0.92%)
> test_l4lb_noinline_dynptr.bpf.linked3.o     balancer_ingress          1432       1455     +23 (+1.61%)          92          94    +2 (+2.17%)
> test_xdp_noinline.bpf.linked3.o             balancer_ingress_v6       3462       3458      -4 (-0.12%)         216         216    +0 (+0.00%)
> verifier_iterating_callbacks.bpf.linked3.o  widening                    52         41    -11 (-21.15%)           4           3   -1 (-25.00%)
> xdp_synproxy_kern.bpf.linked3.o             syncookie_tc             12412      11719    -693 (-5.58%)         345         330   -15 (-4.35%)
> xdp_synproxy_kern.bpf.linked3.o             syncookie_xdp            12478      11794    -684 (-5.48%)         346         331   -15 (-4.34%)
>
> test_l4lb_noinline and test_l4lb_noinline_dynptr has minor regression, but
> pyperf600_bpf_loop and local_storage_bench gets pretty good improvement.
>
>   [1] https://lore.kernel.org/all/20231205184248.1502704-1-andrii@kernel.org/
>   [2] https://lore.kernel.org/all/20231205184248.1502704-9-andrii@kernel.org/
>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> Cc: Martin KaFai Lau <kafai@fb.com>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  kernel/bpf/verifier.c                         | 21 +++++++++++++++++--
>  .../selftests/bpf/progs/verifier_spill_fill.c | 16 +++++++-------
>  2 files changed, 27 insertions(+), 10 deletions(-)
>
> Changelogs:
>   v1 -> v2:
>     - Preserve with-const-zero spill if writing is also zero
>       in check_stack_write_var_off().
>     - Add a test with not-8-byte-aligned BPF_ST store.
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d4e31f61de0e..cfe7a68d90a5 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4491,7 +4491,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>                 if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
>                         state->stack[spi].spilled_ptr.id = 0;
>         } else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) &&
> -                  insn->imm != 0 && env->bpf_capable) {
> +                  env->bpf_capable) {
>                 struct bpf_reg_state fake_reg = {};
>
>                 __mark_reg_known(&fake_reg, insn->imm);
> @@ -4613,11 +4613,28 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
>
>         /* Variable offset writes destroy any spilled pointers in range. */
>         for (i = min_off; i < max_off; i++) {
> +               struct bpf_reg_state *spill_reg;
>                 u8 new_type, *stype;
> -               int slot, spi;
> +               int slot, spi, j;
>
>                 slot = -i - 1;
>                 spi = slot / BPF_REG_SIZE;
> +
> +               /* If writing_zero and the the spi slot contains a spill of value 0,
> +                * maintain the spill type.
> +                */
> +               if (writing_zero && !(i % BPF_REG_SIZE) && is_spilled_scalar_reg(&state->stack[spi])) {
> +                       spill_reg = &state->stack[spi].spilled_ptr;
> +                       if (tnum_is_const(spill_reg->var_off) && spill_reg->var_off.value == 0) {
> +                               for (j = BPF_REG_SIZE; j > 0; j--) {
> +                                       if (state->stack[spi].slot_type[j - 1] != STACK_SPILL)
> +                                               break;
> +                               }
> +                               i += BPF_REG_SIZE - j - 1;
> +                               continue;
> +                       }
> +               }
> +

Yonghong, I just replied to one of Eduard's email. I think the overall
approach will be correct.

But two small things. Above, if you detect tnum_is_conxt() and value
is zero, it seems like you'd need to set zero_used=true.

But I actually want to propose to implement this slightly differently.
Instead of skipping multiple bytes, I think it would be better to just
check one byte at a time. Just like we have


if (writing_zero && *stype == STACK_ZERO) {
    new_type = STACK_ZERO;
    zero_used = true;
}

we can insert

if (writing_zero && *stype == STACK_SPILL && tnum_is_const(..) &&
var_off.value == 0) {
    zero_used = true;
    continue;
}

It will be very similar to STACK_ZERO handling, but we won't be
overwriting slot type. But handling one byte at a time is more in line
with the rest of the logic.

WDYT?

>                 stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
>                 mark_stack_slot_scratched(env, spi);
>

[...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers
  2024-01-08 23:23 ` Andrii Nakryiko
@ 2024-01-08 23:39   ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2024-01-08 23:39 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Kuniyuki Iwashima,
	Martin KaFai Lau


On 1/8/24 3:23 PM, Andrii Nakryiko wrote:
> On Wed, Jan 3, 2024 at 3:26 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>> With patch set [1], precision backtracing supports register spill/fill
>> to/from the stack. The patch [2] allows initial imprecise register spill
>> with content 0. This is a common case for cpuv3 and lower for
>> initializing the stack variables with pattern
>>    r1 = 0
>>    *(u64 *)(r10 - 8) = r1
>> and the [2] has demonstrated good verification improvement.
>>
>> For cpuv4, the initialization could be
>>    *(u64 *)(r10 - 8) = 0
>> The current verifier marks the r10-8 contents with STACK_ZERO.
>> Similar to [2], let us permit the above insn to behave like
>> imprecise register spill which can reduce number of verified states.
>> The change is in function check_stack_write_fixed_off().
>>
>> Before this patch, spilled zero will be marked as STACK_ZERO
>> which can provide precise values. In check_stack_write_var_off(),
>> STACK_ZERO will be maintained if writing a const zero
>> so later it can provide precise values if needed.
>>
>> The above handling of '*(u64 *)(r10 - 8) = 0' as a spill
>> will have issues in check_stack_write_var_off() as the spill
>> will be converted to STACK_MISC and the precise value 0
>> is lost. To fix this issue, if the spill slots with const
>> zero and the BPF_ST write also with const zero, the spill slots
>> are preserved, which can later provide precise values
>> if needed. Without the change in check_stack_write_var_off(),
>> the test_verifier subtest 'BPF_ST_MEM stack imm zero, variable offset'
>> will fail.
>>
>> I checked cpuv3 and cpuv4 with and without this patch with veristat.
>> There is no state change for cpuv3 since '*(u64 *)(r10 - 8) = 0'
>> is only generated with cpuv4.
>>
>> For cpuv4:
>> $ ../veristat -C old.cpuv4.csv new.cpuv4.csv -e file,prog,insns,states -f 'insns_diff!=0'
>> File                                        Program              Insns (A)  Insns (B)  Insns    (DIFF)  States (A)  States (B)  States (DIFF)
>> ------------------------------------------  -------------------  ---------  ---------  ---------------  ----------  ----------  -------------
>> local_storage_bench.bpf.linked3.o           get_local                  228        168    -60 (-26.32%)          17          14   -3 (-17.65%)
>> pyperf600_bpf_loop.bpf.linked3.o            on_event                  6066       4889  -1177 (-19.40%)         403         321  -82 (-20.35%)
>> test_cls_redirect.bpf.linked3.o             cls_redirect             35483      35387     -96 (-0.27%)        2179        2177    -2 (-0.09%)
>> test_l4lb_noinline.bpf.linked3.o            balancer_ingress          4494       4522     +28 (+0.62%)         217         219    +2 (+0.92%)
>> test_l4lb_noinline_dynptr.bpf.linked3.o     balancer_ingress          1432       1455     +23 (+1.61%)          92          94    +2 (+2.17%)
>> test_xdp_noinline.bpf.linked3.o             balancer_ingress_v6       3462       3458      -4 (-0.12%)         216         216    +0 (+0.00%)
>> verifier_iterating_callbacks.bpf.linked3.o  widening                    52         41    -11 (-21.15%)           4           3   -1 (-25.00%)
>> xdp_synproxy_kern.bpf.linked3.o             syncookie_tc             12412      11719    -693 (-5.58%)         345         330   -15 (-4.35%)
>> xdp_synproxy_kern.bpf.linked3.o             syncookie_xdp            12478      11794    -684 (-5.48%)         346         331   -15 (-4.34%)
>>
>> test_l4lb_noinline and test_l4lb_noinline_dynptr has minor regression, but
>> pyperf600_bpf_loop and local_storage_bench gets pretty good improvement.
>>
>>    [1] https://lore.kernel.org/all/20231205184248.1502704-1-andrii@kernel.org/
>>    [2] https://lore.kernel.org/all/20231205184248.1502704-9-andrii@kernel.org/
>>
>> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
>> Cc: Martin KaFai Lau <kafai@fb.com>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   kernel/bpf/verifier.c                         | 21 +++++++++++++++++--
>>   .../selftests/bpf/progs/verifier_spill_fill.c | 16 +++++++-------
>>   2 files changed, 27 insertions(+), 10 deletions(-)
>>
>> Changelogs:
>>    v1 -> v2:
>>      - Preserve with-const-zero spill if writing is also zero
>>        in check_stack_write_var_off().
>>      - Add a test with not-8-byte-aligned BPF_ST store.
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index d4e31f61de0e..cfe7a68d90a5 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -4491,7 +4491,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>>                  if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
>>                          state->stack[spi].spilled_ptr.id = 0;
>>          } else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) &&
>> -                  insn->imm != 0 && env->bpf_capable) {
>> +                  env->bpf_capable) {
>>                  struct bpf_reg_state fake_reg = {};
>>
>>                  __mark_reg_known(&fake_reg, insn->imm);
>> @@ -4613,11 +4613,28 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
>>
>>          /* Variable offset writes destroy any spilled pointers in range. */
>>          for (i = min_off; i < max_off; i++) {
>> +               struct bpf_reg_state *spill_reg;
>>                  u8 new_type, *stype;
>> -               int slot, spi;
>> +               int slot, spi, j;
>>
>>                  slot = -i - 1;
>>                  spi = slot / BPF_REG_SIZE;
>> +
>> +               /* If writing_zero and the the spi slot contains a spill of value 0,
>> +                * maintain the spill type.
>> +                */
>> +               if (writing_zero && !(i % BPF_REG_SIZE) && is_spilled_scalar_reg(&state->stack[spi])) {
>> +                       spill_reg = &state->stack[spi].spilled_ptr;
>> +                       if (tnum_is_const(spill_reg->var_off) && spill_reg->var_off.value == 0) {
>> +                               for (j = BPF_REG_SIZE; j > 0; j--) {
>> +                                       if (state->stack[spi].slot_type[j - 1] != STACK_SPILL)
>> +                                               break;
>> +                               }
>> +                               i += BPF_REG_SIZE - j - 1;
>> +                               continue;
>> +                       }
>> +               }
>> +
> Yonghong, I just replied to one of Eduard's email. I think the overall
> approach will be correct.
>
> But two small things. Above, if you detect tnum_is_conxt() and value
> is zero, it seems like you'd need to set zero_used=true.

Yes, my planned change is to add mark_chain_precision() explicitly after
    if (writing_zero && !(i % BPF_REG_SIZE) && is_spilled_scalar_reg(&state->stack[spi])) {

But yes, setting zero_used=true much simpler.

>
> But I actually want to propose to implement this slightly differently.
> Instead of skipping multiple bytes, I think it would be better to just
> check one byte at a time. Just like we have
>
>
> if (writing_zero && *stype == STACK_ZERO) {
>      new_type = STACK_ZERO;
>      zero_used = true;
> }
>
> we can insert
>
> if (writing_zero && *stype == STACK_SPILL && tnum_is_const(..) &&
> var_off.value == 0) {
>      zero_used = true;
>      continue;
> }
>
> It will be very similar to STACK_ZERO handling, but we won't be
> overwriting slot type. But handling one byte at a time is more in line
> with the rest of the logic.
>
> WDYT?

Thanks for suggestion. Sounds good. Will do.

>
>>                  stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
>>                  mark_stack_slot_scratched(env, spi);
>>
> [...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-01-08 23:39 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-03 23:26 [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers Yonghong Song
2024-01-03 23:26 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add a selftest with not-8-byte aligned BPF_ST Yonghong Song
2024-01-04 16:37 ` [PATCH bpf-next v2 1/2] bpf: Track aligned st store as imprecise spilled registers Eduard Zingerman
2024-01-04 17:13   ` Yonghong Song
2024-01-04 18:43     ` Eduard Zingerman
2024-01-04 18:30 ` Eduard Zingerman
2024-01-04 20:12   ` Yonghong Song
2024-01-04 21:10     ` Eduard Zingerman
2024-01-04 23:09       ` Andrii Nakryiko
2024-01-04 23:29         ` Eduard Zingerman
2024-01-05  1:05           ` Andrii Nakryiko
2024-01-05  7:14             ` Yonghong Song
2024-01-05  8:10               ` Yonghong Song
2024-01-05 23:37               ` Eduard Zingerman
2024-01-08 18:59                 ` Yonghong Song
2024-01-08 19:06                   ` Eduard Zingerman
2024-01-08 19:40                     ` Yonghong Song
2024-01-05 23:52             ` Eduard Zingerman
2024-01-08 19:51               ` Yonghong Song
2024-01-08 20:05                 ` Eduard Zingerman
2024-01-08 21:51                   ` Yonghong Song
2024-01-08 23:18               ` Andrii Nakryiko
2024-01-04 23:03 ` Andrii Nakryiko
2024-01-05  0:49 ` Andrii Nakryiko
2024-01-08 23:23 ` Andrii Nakryiko
2024-01-08 23:39   ` Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox