[PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks
@ 2025-03-12  3:13 Eduard Zingerman
  2025-03-12  3:13 ` [PATCH bpf-next v1 2/2] selftests/bpf: test case with missing read/precision mark Eduard Zingerman
  2025-03-13 19:28 ` [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks Eduard Zingerman
  0 siblings, 2 replies; 9+ messages in thread
From: Eduard Zingerman @ 2025-03-12  3:13 UTC (permalink / raw)
  To: bpf, ast
  Cc: andrii, daniel, martin.lau, kernel-team, yonghong.song,
	Eduard Zingerman

Suppose the verifier state exploration graph looks as follows:

    .-> A --.    Suppose:
    |   |   |    - state A is at iterator 'next';
    |   v   v    - path A -> B -> A is verified first;
    '-- B   C    - path A -> C is verified next;
                 - B does not impose a read mark for register R1;
                 - C imposes a read mark for register R1;

Under such conditions:
- when B is explored and A is identified as its loop entry, the read
  marks are copied from A to B by propagate_liveness(), but these
  marks do not include R1;
- when C is explored, the read mark for R1 is propagated to A,
  but not to B.
- at this point, state A has its branch count at zero, but state
  B has incomplete read marks.

The same logic applies to precision marks.
This means that states with a loop entry can have incomplete read and
precision marks, regardless of whether the loop entry itself has
branches.

The current verification logic does not account for this. An example
of an unsafe program accepted by the verifier is the selftest included
in the next patch.

Fix this by removing bpf_verifier_state->branches checks for loop
entries in clean_live_states() and is_state_visited().

Verification performance impact for selftests and sched_ext:

========= selftests: master vs patch =========

File                                Program            States (A)  States (B)  States (DIFF)
----------------------------------  -----------------  ----------  ----------  -------------
iters.bpf.o                         clean_live_states          66          67    +1 (+1.52%)
verifier_iterating_callbacks.bpf.o  cond_break2                10          13   +3 (+30.00%)

Total progs: 3579
Old success: 2061
New success: 2061
States diff min:    0.00%
States diff max:   30.00%
   0 .. 5    %: 3578
  30 .. 35   %: 1

========= sched_ext: master vs patch =========

File            Program           States (A)  States (B)  States (DIFF)
--------------  ----------------  ----------  ----------  -------------
bpf.bpf.o       layered_dispatch         501         516   +15 (+2.99%)
bpf.bpf.o       layered_dump             237         252   +15 (+6.33%)
bpf.bpf.o       layered_init             423         432    +9 (+2.13%)
bpf.bpf.o       p2dq_init                142         144    +2 (+1.41%)
scx_pair.bpf.o  pair_dispatch            111         138  +27 (+24.32%)
scx_qmap.bpf.o  qmap_dump                 22          30   +8 (+36.36%)
scx_qmap.bpf.o  qmap_init                654         656    +2 (+0.31%)

Total progs: 216
Old success: 186
New success: 186
States diff min:    0.00%
States diff max:   36.36%
   0 .. 5    %: 213
   5 .. 15   %: 1
  20 .. 30   %: 1
  35 .. 40   %: 1

Fixes: 2a0992829ea3 ("bpf: correct loop detection for iterators convergence")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 kernel/bpf/verifier.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3303a3605ee8..6c18628fa9d8 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18111,7 +18111,7 @@ static void clean_live_states(struct bpf_verifier_env *env, int insn,
 		if (sl->state.branches)
 			continue;
 		loop_entry = get_loop_entry(env, &sl->state);
-		if (!IS_ERR_OR_NULL(loop_entry) && loop_entry->branches)
+		if (!IS_ERR_OR_NULL(loop_entry))
 			continue;
 		if (sl->state.insn_idx != insn ||
 		    !same_callsites(&sl->state, cur))
@@ -18816,7 +18816,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 	struct bpf_verifier_state_list *sl;
 	struct bpf_verifier_state *cur = env->cur_state, *new, *loop_entry;
 	int i, j, n, err, states_cnt = 0;
-	bool force_new_state, add_new_state, force_exact;
+	bool force_new_state, add_new_state;
 	struct list_head *pos, *tmp, *head;
 
 	force_new_state = env->test_state_freq || is_force_checkpoint(env, insn_idx) ||
@@ -18996,9 +18996,8 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 		loop_entry = get_loop_entry(env, &sl->state);
 		if (IS_ERR(loop_entry))
 			return PTR_ERR(loop_entry);
-		force_exact = loop_entry && loop_entry->branches > 0;
-		if (states_equal(env, &sl->state, cur, force_exact ? RANGE_WITHIN : NOT_EXACT)) {
-			if (force_exact)
+		if (states_equal(env, &sl->state, cur, loop_entry ? RANGE_WITHIN : NOT_EXACT)) {
+			if (loop_entry)
 				update_loop_entry(env, cur, loop_entry);
 hit:
 			sl->hit_cnt++;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH bpf-next v1 2/2] selftests/bpf: test case with missing read/precision mark
  2025-03-12  3:13 [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks Eduard Zingerman
@ 2025-03-12  3:13 ` Eduard Zingerman
  2025-03-13 19:28 ` [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks Eduard Zingerman
  1 sibling, 0 replies; 9+ messages in thread
From: Eduard Zingerman @ 2025-03-12  3:13 UTC (permalink / raw)
  To: bpf, ast
  Cc: andrii, daniel, martin.lau, kernel-team, yonghong.song,
	Eduard Zingerman

The test case is equivalent of the following C program:

   1: r8 = bpf_get_prandom_u32();
   2: r6 = -32;
   3: bpf_iter_num_new(&fp[-8], 0, 10);
   4: if (unlikely(bpf_get_prandom_u32()))
   5:   r6 = -31;
   6: for (;;) {
   7:   if (!bpf_iter_num_next(&fp[-8]))
   8:     break;
   9:   if (unlikely(bpf_get_prandom_u32()))
  10:     *(u64 *)(fp + r6) = 7;
  11: }
  12: bpf_iter_num_destroy(&fp[-8]);
  13: return 0;

W/o a fix that instructs verifier to ignore branches count for loop
entries verification proceeds as follows:
- 1-4, state is {r6=-32,fp-8=active};
- 6, checkpoint A is created with {r6=-32,fp-8=active};
- 7, checkpoint B is created with {r6=-32,fp-8=active},
     push state {r6=-32,fp-8=active} from 7 to 9;
- 8,12,13, {r6=-32,fp-8=drained}, exit;
- pop state with {r6=-32,fp-8=active} from 7 to 9;
- 9, push state {r6=-32,fp-8=active} from 9 to 10;
- 6, checkpoint C is created with {r6=-32,fp-8=active};
- 7, checkpoint A is hit, no precision or propagated for r6 to C;
- pop state {r6=-32,fp-8=active} from 9 to 10;
- 10, state is {r6=-31,fp-8=active}, r6 is marked as read and precise,
      these marks are propagated to checkpoints A and B (but not C, as
      it is not the parent of current state;
- 6, {r6=-31,fp-8=active} checkpoint C is hit, because r6 is not
     marked precise for this checkpoint;
- the program is accepted, despite a possibility of unaligned u64
  stack access at offset -31.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 tools/testing/selftests/bpf/progs/iters.c | 65 +++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/iters.c b/tools/testing/selftests/bpf/progs/iters.c
index 427b72954b87..ffb574321404 100644
--- a/tools/testing/selftests/bpf/progs/iters.c
+++ b/tools/testing/selftests/bpf/progs/iters.c
@@ -1651,4 +1651,69 @@ int clean_live_states(const void *ctx)
 	return 0;
 }
 
+SEC("?raw_tp")
+__flag(BPF_F_TEST_STATE_FREQ)
+__failure __msg("misaligned stack access off 0+-31+0 size 8")
+__naked int absent_mark_in_the_middle_state(void)
+{
+	/* This is equivalent to C program below.
+	 *
+	 * r8 = bpf_get_prandom_u32();
+	 * r6 = -32;
+	 * bpf_iter_num_new(&fp[-8], 0, 10);
+	 * if (unlikely(bpf_get_prandom_u32()))
+	 *   r6 = -31;
+	 * while (bpf_iter_num_next(&fp[-8])) {
+	 *   if (unlikely(bpf_get_prandom_u32()))
+	 *     *(fp + r6) = 7;
+	 * }
+	 * bpf_iter_num_destroy(&fp[-8])
+	 * return 0
+	 */
+	asm volatile (
+		"call %[bpf_get_prandom_u32];"
+		"r8 = r0;"
+		"r7 = 0;"
+		"r6 = -32;"
+		"r0 = 0;"
+		"*(u64 *)(r10 - 16) = r0;"
+		"r1 = r10;"
+		"r1 += -8;"
+		"r2 = 0;"
+		"r3 = 10;"
+		"call %[bpf_iter_num_new];"
+		"call %[bpf_get_prandom_u32];"
+		"if r0 == r8 goto change_r6_%=;"
+	"loop_%=:"
+		"r1 = r10;"
+		"r1 += -8;"
+		"call %[bpf_iter_num_next];"
+		"if r0 == 0 goto loop_end_%=;"
+		"call %[bpf_get_prandom_u32];"
+		"if r0 == r8 goto use_r6_%=;"
+		"goto loop_%=;"
+	"loop_end_%=:"
+		"r1 = r10;"
+		"r1 += -8;"
+		"call %[bpf_iter_num_destroy];"
+		"r0 = 0;"
+		"exit;"
+	"use_r6_%=:"
+		"r0 = r10;"
+		"r0 += r6;"
+		"r1 = 7;"
+		"*(u64 *)(r0 + 0) = r1;"
+		"goto loop_%=;"
+	"change_r6_%=:"
+		"r6 = -31;"
+		"goto loop_%=;"
+		:
+		: __imm(bpf_iter_num_new),
+		  __imm(bpf_iter_num_next),
+		  __imm(bpf_iter_num_destroy),
+		  __imm(bpf_get_prandom_u32)
+		: __clobber_all
+	);
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks
  2025-03-12  3:13 [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks Eduard Zingerman
  2025-03-12  3:13 ` [PATCH bpf-next v1 2/2] selftests/bpf: test case with missing read/precision mark Eduard Zingerman
@ 2025-03-13 19:28 ` Eduard Zingerman
  2025-03-14 17:41   ` Eduard Zingerman
  1 sibling, 1 reply; 9+ messages in thread
From: Eduard Zingerman @ 2025-03-13 19:28 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, martin.lau, kernel-team, yonghong.song

On Tue, 2025-03-11 at 20:13 -0700, Eduard Zingerman wrote:
> Suppose the verifier state exploration graph looks as follows:
> 
>     .-> A --.    Suppose:
>     |   |   |    - state A is at iterator 'next';
>     |   v   v    - path A -> B -> A is verified first;
>     '-- B   C    - path A -> C is verified next;
>                  - B does not impose a read mark for register R1;
>                  - C imposes a read mark for register R1;
> 
> Under such conditions:
> - when B is explored and A is identified as its loop entry, the read
>   marks are copied from A to B by propagate_liveness(), but these
>   marks do not include R1;
> - when C is explored, the read mark for R1 is propagated to A,
>   but not to B.
> - at this point, state A has its branch count at zero, but state
>   B has incomplete read marks.
> 
> The same logic applies to precision marks.
> This means that states with a loop entry can have incomplete read and
> precision marks, regardless of whether the loop entry itself has
> branches.

Which makes me wonder.
If read/precision marks for B are not final and some state D outside
of the loop becomes equal to B, the read/precision marks for that
state would be incomplete as well:

        D------.  // as some read/precision marks are missing from C
               |  // propagate_liveness() won't copy all necessary
    .-> A --.  |  // marks to D.
    |   |   |  |
    |   v   v  |
    '-- B   C  |
        ^      |
        '------'

This makes comparison with 'loop_entry' states contagious,
propagating incomplete read/precision mark flag up to the root state.
This will have verification performance implications.

Alternatively read/precision marks need to be propagated in the state
graph until fixed point is reached. Like with DFA analysis.

Решето.

> The current verification logic does not account for this. An example
> of an unsafe program accepted by the verifier is the selftest included
> in the next patch.

[...]


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks
  2025-03-13 19:28 ` [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks Eduard Zingerman
@ 2025-03-14 17:41   ` Eduard Zingerman
  2025-03-15  2:51     ` Alexei Starovoitov
  0 siblings, 1 reply; 9+ messages in thread
From: Eduard Zingerman @ 2025-03-14 17:41 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, martin.lau, kernel-team, yonghong.song

On Thu, 2025-03-13 at 12:28 -0700, Eduard Zingerman wrote:

[...]

> Which makes me wonder.
> If read/precision marks for B are not final and some state D outside
> of the loop becomes equal to B, the read/precision marks for that
> state would be incomplete as well:
> 
>         D------.  // as some read/precision marks are missing from C
>                |  // propagate_liveness() won't copy all necessary
>     .-> A --.  |  // marks to D.
>     |   |   |  |
>     |   v   v  |
>     '-- B   C  |
>         ^      |
>         '------'
> 
> This makes comparison with 'loop_entry' states contagious,
> propagating incomplete read/precision mark flag up to the root state.
> This will have verification performance implications.
> 
> Alternatively read/precision marks need to be propagated in the state
> graph until fixed point is reached. Like with DFA analysis.
> 
> Решето.

And below is an example that verifier does not catch.
Another possibility is to forgo loop entries altogether and upon
states_equal(cached, cur, RANGE_WITHIN) mark all registers in the
`cached` state as read and precise, propagating this info in `cur`.
I'll try this as well.

--- 8< --------------------------------
SEC("?raw_tp")
__flag(BPF_F_TEST_STATE_FREQ)
__failure __msg("misaligned stack access off 0+-31+0 size 8")
__naked int absent_mark_in_the_middle_state2(void)
{
	asm volatile (
		"call %[bpf_get_prandom_u32];"
		"r8 = r0;"
		"r7 = 0;"
		"r6 = -32;"
		"r0 = 0;"
		"*(u64 *)(r10 - 16) = r0;"
		"r1 = r10;"
		"r1 += -8;"
		"r2 = 0;"
		"r3 = 10;"
		"call %[bpf_iter_num_new];"
		"call %[bpf_get_prandom_u32];"
		"if r0 == r8 goto change_r6_%=;"
		"call %[bpf_get_prandom_u32];"
	"before_loop_%=:"
		"if r0 == r8 goto jump_into_loop_%=;"
	"loop_%=:"
		"r1 = r10;"
		"r1 += -8;"
		"call %[bpf_iter_num_next];"
		"if r0 == 0 goto loop_end_%=;"
		"call %[bpf_get_prandom_u32];"
		"if r0 == r8 goto use_r6_%=;"
		"goto loop_%=;"
	"loop_end_%=:"
		"r1 = r10;"
		"r1 += -8;"
		"call %[bpf_iter_num_destroy];"
		"r0 = 0;"
		"exit;"
	"use_r6_%=:"
		"r0 = r10;"
		"r0 += r6;"
		"r1 = 7;"
		"*(u64 *)(r0 + 0) = r1;"
		"goto loop_%=;"
	"change_r6_%=:"
		"r6 = -31;"
	"jump_into_loop_%=: "
		"goto +0;"
		"goto loop_%=;"
		:
		: __imm(bpf_iter_num_new),
		  __imm(bpf_iter_num_next),
		  __imm(bpf_iter_num_destroy),
		  __imm(bpf_get_prandom_u32)
		: __clobber_all
	);
}
-------------------------------- >8 ---

[...]


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks
  2025-03-14 17:41   ` Eduard Zingerman
@ 2025-03-15  2:51     ` Alexei Starovoitov
  2025-03-15  6:04       ` Eduard Zingerman
  2025-03-17 20:28       ` Eduard Zingerman
  0 siblings, 2 replies; 9+ messages in thread
From: Alexei Starovoitov @ 2025-03-15  2:51 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Yonghong Song

On Fri, Mar 14, 2025 at 10:41 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2025-03-13 at 12:28 -0700, Eduard Zingerman wrote:
>
> [...]
>
> > Which makes me wonder.
> > If read/precision marks for B are not final and some state D outside
> > of the loop becomes equal to B, the read/precision marks for that
> > state would be incomplete as well:
> >
> >         D------.  // as some read/precision marks are missing from C
> >                |  // propagate_liveness() won't copy all necessary
> >     .-> A --.  |  // marks to D.
> >     |   |   |  |
> >     |   v   v  |
> >     '-- B   C  |
> >         ^      |
> >         '------'
> >
> > This makes comparison with 'loop_entry' states contagious,
> > propagating incomplete read/precision mark flag up to the root state.
> > This will have verification performance implications.
> >
> > Alternatively read/precision marks need to be propagated in the state
> > graph until fixed point is reached. Like with DFA analysis.
> >
> > Решето.
>
> And below is an example that verifier does not catch.

Looks like the whole concept of old-style liveness and precision
is broken with loops.
propagate_liveness() will take marks from old state,
but old is incomplete, so propagating them into cur doesn't
make cur complete either.

> Another possibility is to forgo loop entries altogether and upon
> states_equal(cached, cur, RANGE_WITHIN) mark all registers in the
> `cached` state as read and precise, propagating this info in `cur`.
> I'll try this as well.

Have a gut feel that it won't work.
Currently we have loop_entry->branches is a flag of "completeness".
which doesn't work for loops,
so maybe we need a bool flag for looping states and instead of:
force_exact = loop_entry && complete
use
force_exact = loop_entry || incomplete

looping state will have "incomplete" flag cleared only when branches == 0 ?
or maybe never.

The further we get into this the more I think we need to get rid of
existing liveness, precision, and everything path sensitive and
convert it all to data flow analysis.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks
  2025-03-15  2:51     ` Alexei Starovoitov
@ 2025-03-15  6:04       ` Eduard Zingerman
  2025-03-17 20:28       ` Eduard Zingerman
  1 sibling, 0 replies; 9+ messages in thread
From: Eduard Zingerman @ 2025-03-15  6:04 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Yonghong Song

On Fri, 2025-03-14 at 19:51 -0700, Alexei Starovoitov wrote:
> On Fri, Mar 14, 2025 at 10:41 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Thu, 2025-03-13 at 12:28 -0700, Eduard Zingerman wrote:
> > 
> > [...]
> > 
> > > Which makes me wonder.
> > > If read/precision marks for B are not final and some state D outside
> > > of the loop becomes equal to B, the read/precision marks for that
> > > state would be incomplete as well:
> > > 
> > >         D------.  // as some read/precision marks are missing from C
> > >                |  // propagate_liveness() won't copy all necessary
> > >     .-> A --.  |  // marks to D.
> > >     |   |   |  |
> > >     |   v   v  |
> > >     '-- B   C  |
> > >         ^      |
> > >         '------'
> > > 
> > > This makes comparison with 'loop_entry' states contagious,
> > > propagating incomplete read/precision mark flag up to the root state.
> > > This will have verification performance implications.
> > > 
> > > Alternatively read/precision marks need to be propagated in the state
> > > graph until fixed point is reached. Like with DFA analysis.
> > > 
> > > Решето.
> > 
> > And below is an example that verifier does not catch.
> 
> Looks like the whole concept of old-style liveness and precision
> is broken with loops.
> propagate_liveness() will take marks from old state,
> but old is incomplete, so propagating them into cur doesn't
> make cur complete either.

Yes.
 
> > Another possibility is to forgo loop entries altogether and upon
> > states_equal(cached, cur, RANGE_WITHIN) mark all registers in the
> > `cached` state as read and precise, propagating this info in `cur`.
> > I'll try this as well.
> 
> Have a gut feel that it won't work.
> Currently we have loop_entry->branches is a flag of "completeness".
> which doesn't work for loops,
> so maybe we need a bool flag for looping states and instead of:
> force_exact = loop_entry && complete
> use
> force_exact = loop_entry || incomplete
> 
> looping state will have "incomplete" flag cleared only when branches == 0 ?
> or maybe never.

I think about it as follows:
- We can think about our path-tracing as if it is a path sensitive DFA;
- To make path sensitivity work we instantiate each "basic block"
  (span of instructions) for each path;
- Hence, instead of CFG there is a graph of states;
- So, use/def problem can be solved on this graph just like it is
  solved for CFG, but one needs to keep the complete graph to
  propagate the marks and reach fixed point;
- When "loop entry" is reached, assuming that all registers are read
  and all scalars are precise in that state is an over-approximation
  of an exact solution.
  
But I might miss something important.

> The further we get into this the more I think we need to get rid of
> existing liveness, precision, and everything path sensitive and
> convert it all to data flow analysis.

That's what I think as well.
I'll get back with analysis for cls_redirect and balancer_ingress from [1].

[1] https://lore.kernel.org/bpf/CAADnVQKcOLDqwhhQpy6YU13ZbY3edGgx1XpXF5VsmXt9Byxokg@mail.gmail.com/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks
  2025-03-15  2:51     ` Alexei Starovoitov
  2025-03-15  6:04       ` Eduard Zingerman
@ 2025-03-17 20:28       ` Eduard Zingerman
  2025-03-17 21:46         ` Alexei Starovoitov
  1 sibling, 1 reply; 9+ messages in thread
From: Eduard Zingerman @ 2025-03-17 20:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Yonghong Song

On Fri, 2025-03-14 at 19:51 -0700, Alexei Starovoitov wrote:

[...]

> Looks like the whole concept of old-style liveness and precision
> is broken with loops.
> propagate_liveness() will take marks from old state,
> but old is incomplete, so propagating them into cur doesn't
> make cur complete either.
> 
> > Another possibility is to forgo loop entries altogether and upon
> > states_equal(cached, cur, RANGE_WITHIN) mark all registers in the
> > `cached` state as read and precise, propagating this info in `cur`.
> > I'll try this as well.
> 
> Have a gut feel that it won't work.
> Currently we have loop_entry->branches is a flag of "completeness".
> which doesn't work for loops,
> so maybe we need a bool flag for looping states and instead of:
> force_exact = loop_entry && complete
> use
> force_exact = loop_entry || incomplete
> 
> looping state will have "incomplete" flag cleared only when branches == 0 ?
> or maybe never.
> 
> The further we get into this the more I think we need to get rid of
> existing liveness, precision, and everything path sensitive and
> convert it all to data flow analysis.

In [1] I tried conservatively marking cached state registers as both
read and precise when states_equal(cached, cur, RANGE_WITHIN) == true.
It works for the example at hand, but indeed falls apart because it
interferes with widening logic. So, anything like below can't be
verified anymore (what was I thinking?):

    SEC("raw_tp")
    __success
    int iter_nested_deeply_iters(const void *ctx)
    {
    	int sum = 0;
    
    	MY_PID_GUARD();
    
    	bpf_repeat(10) {
    		...
    		sum += 1;
    		...
    	}
    
    	return sum;
    }

Looks like there are only two options left:
a. propagate read/precision marks in state graph until fixed point is
   reached;
b. switch to CFG based DFA analysis (will need slot/register type
   propagation to identify which scalars can be precise).

(a) is more in line with what we do currently and probably less code.
But I expect that it would be harder to think about in case if
something goes wrong (e.g. we don't print/draw states graph at the
moment, I have a debug patch for this that I just cherry-pick).

[1] https://github.com/eddyz87/bpf/tree/incomplete-read-marks-debug.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks
  2025-03-17 20:28       ` Eduard Zingerman
@ 2025-03-17 21:46         ` Alexei Starovoitov
  2025-03-21 20:02           ` Eduard Zingerman
  0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2025-03-17 21:46 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Yonghong Song

On Mon, Mar 17, 2025 at 1:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2025-03-14 at 19:51 -0700, Alexei Starovoitov wrote:
>
> [...]
>
> > Looks like the whole concept of old-style liveness and precision
> > is broken with loops.
> > propagate_liveness() will take marks from old state,
> > but old is incomplete, so propagating them into cur doesn't
> > make cur complete either.
> >
> > > Another possibility is to forgo loop entries altogether and upon
> > > states_equal(cached, cur, RANGE_WITHIN) mark all registers in the
> > > `cached` state as read and precise, propagating this info in `cur`.
> > > I'll try this as well.
> >
> > Have a gut feel that it won't work.
> > Currently we have loop_entry->branches is a flag of "completeness".
> > which doesn't work for loops,
> > so maybe we need a bool flag for looping states and instead of:
> > force_exact = loop_entry && complete
> > use
> > force_exact = loop_entry || incomplete
> >
> > looping state will have "incomplete" flag cleared only when branches == 0 ?
> > or maybe never.
> >
> > The further we get into this the more I think we need to get rid of
> > existing liveness, precision, and everything path sensitive and
> > convert it all to data flow analysis.
>
> In [1] I tried conservatively marking cached state registers as both
> read and precise when states_equal(cached, cur, RANGE_WITHIN) == true.
> It works for the example at hand, but indeed falls apart because it
> interferes with widening logic.

hindsight 20/20 :)

> So, anything like below can't be
> verified anymore (what was I thinking?):
>
>     SEC("raw_tp")
>     __success
>     int iter_nested_deeply_iters(const void *ctx)
>     {
>         int sum = 0;
>
>         MY_PID_GUARD();
>
>         bpf_repeat(10) {
>                 ...
>                 sum += 1;
>                 ...
>         }
>
>         return sum;
>     }
>
> Looks like there are only two options left:
> a. propagate read/precision marks in state graph until fixed point is
>    reached;
> b. switch to CFG based DFA analysis (will need slot/register type
>    propagation to identify which scalars can be precise).
>
> (a) is more in line with what we do currently and probably less code.
> But I expect that it would be harder to think about in case if
> something goes wrong (e.g. we don't print/draw states graph at the
> moment, I have a debug patch for this that I just cherry-pick).

if (a) is not a ton of code it's worth giving it a shot,
but while(!converged) propagate_precision()
will look scary.
We definitely need to do (b) at the end.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks
  2025-03-17 21:46         ` Alexei Starovoitov
@ 2025-03-21 20:02           ` Eduard Zingerman
  0 siblings, 0 replies; 9+ messages in thread
From: Eduard Zingerman @ 2025-03-21 20:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Yonghong Song

On Mon, 2025-03-17 at 14:46 -0700, Alexei Starovoitov wrote:

[...]

> > Looks like there are only two options left:
> > a. propagate read/precision marks in state graph until fixed point is
> >    reached;
> > b. switch to CFG based DFA analysis (will need slot/register type
> >    propagation to identify which scalars can be precise).
> >
> > (a) is more in line with what we do currently and probably less code.
> > But I expect that it would be harder to think about in case if
> > something goes wrong (e.g. we don't print/draw states graph at the
> > moment, I have a debug patch for this that I just cherry-pick).
>
> if (a) is not a ton of code it's worth giving it a shot,
> but while(!converged) propagate_precision()
> will look scary.
> We definitely need to do (b) at the end.

I tried a simpler approach we talked about on Wednesday.
Here is an implementation using strongly connected components:
https://github.com/eddyz87/bpf/tree/scc-instead-of-loop-entry

The idea is as follows:
- Compute strongly connected components on control flow graph;
- The number of such components is not big for selftests and
  sched_ext;
- For each component track the following:
  - A flag indicating if states loop had ever been formed by states
    originating in this component
    (states loop is states_equal(...RANGE_WITHIN)).
    When such flag is set the states within the component need to be
    compared using RANGE_WITHIN logic.
  - A counter for "yet to be explored / currently explored" states
    originating from the component + an epoch counter. The epoch
    counter is incremented each time the "yet to be explored" counter
    reaches zero. Verifier states are extended to keep an epoch
    counter if states insn_idx is within the component. The epoch
    counter is needed to know if two verifier states are within the
    same states sub-tree.
- in clean_live_states() mark all state registers/stack slots as read
  and precise if:
  - state->insn_idx is within a strongly connected component;
  - state loops had been formed for states within this component;
  - state's component's epoch differs from current component's epoch,
    meaning that sub-tree of child states had already been explored
    and there would be no need to apply widening within this component.

This almost works.
It comes short for a single selftest:

    int cond_break1(const void *ctx)
    {
    	unsigned long i;
    	unsigned int sum = 0;

    	for (i = zero; i < ARR_SZ && can_loop; i++) // Loop A
    		sum += i;
    	for (i = zero; i < ARR_SZ; i++) {           // Loop B
    		barrier_var(i);
    		sum += i + arr[i];
    		cond_break;
    	}

    	return sum;
    }

The test is still verified, but the number of explored states goes
from 10 to 8K. The problem here is that the loop B converges first and
`clean_live_states()` marks `sum` as precise at entry to loop B.
Then, when a branch originating from Loop A reaches entry to loop B it
hits a checkpoint and propagates `sum` to the `may_goto` checkpoint at
the beginning of the Loop A.
Hence, widening logic for `sum` does not kick in.

There is also a problem with one sched_ext program (lavd_dispatch),
where eager precision marks applied to cached state confuse
mark_chain_precision, at it hits a bug assert when trying to mark r1
precise when backtracking a function call. This is something to debug
further.

---

I also tried a more complex approach, where:
- state loop backedges are accumulated for each strongly connected
  component;
- when component is done exploring (using same branch/epoch logic as
  above) do propagate_{liveness,precision} along the backedges while
  precision marks change.

It is here:
https://github.com/eddyz87/bpf/tree/read-marks-stead-state-per-scc
The test cases at hand pass (absent_mark_in_the_middle_state{,2}),
but it hits memory protection errors on complete test_progs run and
requires further debugging.

Overall, I don't like the level of complexity this approach involves.
It appears that it would be better to invest this complexity into a
compiler style pass computing a conservative approximation of precise
registers / stack slots.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-03-21 20:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-12  3:13 [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks Eduard Zingerman
2025-03-12  3:13 ` [PATCH bpf-next v1 2/2] selftests/bpf: test case with missing read/precision mark Eduard Zingerman
2025-03-13 19:28 ` [PATCH bpf-next v1 1/2] bpf: states with loop entry have incomplete read/precision marks Eduard Zingerman
2025-03-14 17:41   ` Eduard Zingerman
2025-03-15  2:51     ` Alexei Starovoitov
2025-03-15  6:04       ` Eduard Zingerman
2025-03-17 20:28       ` Eduard Zingerman
2025-03-17 21:46         ` Alexei Starovoitov
2025-03-21 20:02           ` Eduard Zingerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox