BPF List
 help / color / mirror / Atom feed
* [PATCH bpf-next v1 1/2] bpf: force checkpoint when jmp history is too long
@ 2024-10-18  2:03 Eduard Zingerman
  2024-10-18  2:03 ` [PATCH bpf-next v1 2/2] selftests/bpf: test with a very short loop Eduard Zingerman
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Eduard Zingerman @ 2024-10-18  2:03 UTC (permalink / raw)
  To: bpf, ast
  Cc: andrii, daniel, martin.lau, kernel-team, yonghong.song,
	Eduard Zingerman

A specifically crafted program might trick verifier into growing very
long jump history within a single bpf_verifier_state instance.
Very long jump history makes mark_chain_precision() unreasonably slow,
especially in case if verifier processes a loop.

Mitigate this by forcing new state in is_state_visited() in case if
current state's jump history is too long.

Use same constant as in `skip_inf_loop_check`, but multiply it by
arbitrarily chosen value 2 to account for jump history containing not
only information about jumps, but also information about stack access.

For an example of problematic program consider the code below,
w/o this patch the example is processed by verifier for ~15 minutes,
before failing to allocate big-enough chunk for jmp_history.

    0: r7 = *(u16 *)(r1 +0);"
    1: r7 += 0x1ab064b9;"
    2: if r7 & 0x702000 goto 1b;
    3: r7 &= 0x1ee60e;"
    4: r7 += r1;"
    5: if r7 s> 0x37d2 goto +0;"
    6: r0 = 0;"
    7: exit;"

Perf profiling shows that most of the time is spent in
mark_chain_precision() ~95%.

The easiest way to explain why this program causes problems is to
apply the following patch:

    diff --git a/include/linux/bpf.h b/include/linux/bpf.h
    index 0c216e71cec7..4b4823961abe 100644
    \--- a/include/linux/bpf.h
    \+++ b/include/linux/bpf.h
    \@@ -1926,7 +1926,7 @@ struct bpf_array {
            };
     };

    -#define BPF_COMPLEXITY_LIMIT_INSNS      1000000 /* yes. 1M insns */
    +#define BPF_COMPLEXITY_LIMIT_INSNS      256 /* yes. 1M insns */
     #define MAX_TAIL_CALL_CNT 33

     /* Maximum number of loops for bpf_loop and bpf_iter_num.
    diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
    index f514247ba8ba..75e88be3bb3e 100644
    \--- a/kernel/bpf/verifier.c
    \+++ b/kernel/bpf/verifier.c
    \@@ -18024,8 +18024,13 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
     skip_inf_loop_check:
                            if (!force_new_state &&
                                env->jmps_processed - env->prev_jmps_processed < 20 &&
    -                           env->insn_processed - env->prev_insn_processed < 100)
    +                           env->insn_processed - env->prev_insn_processed < 100) {
    +                               verbose(env, "is_state_visited: suppressing checkpoint at %d, %d jmps processed, cur->jmp_history_cnt is %d\n",
    +                                       env->insn_idx,
    +                                       env->jmps_processed - env->prev_jmps_processed,
    +                                       cur->jmp_history_cnt);
                                    add_new_state = false;
    +                       }
                            goto miss;
                    }
                    /* If sl->state is a part of a loop and this loop's entry is a part of
    \@@ -18142,6 +18147,9 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
            if (!add_new_state)
                    return 0;

    +       verbose(env, "is_state_visited: new checkpoint at %d, resetting env->jmps_processed\n",
    +               env->insn_idx);
    +
            /* There were no equivalent states, remember the current one.
             * Technically the current state is not proven to be safe yet,
             * but it will either reach outer most bpf_exit (which means it's safe)

And observe verification log:

    ...
    is_state_visited: new checkpoint at 5, resetting env->jmps_processed
    5: R1=ctx() R7=ctx(...)
    5: (65) if r7 s> 0x37d2 goto pc+0     ; R7=ctx(...)
    6: (b7) r0 = 0                        ; R0_w=0
    7: (95) exit

    from 5 to 6: R1=ctx() R7=ctx(...) R10=fp0
    6: R1=ctx() R7=ctx(...) R10=fp0
    6: (b7) r0 = 0                        ; R0_w=0
    7: (95) exit
    is_state_visited: suppressing checkpoint at 1, 3 jmps processed, cur->jmp_history_cnt is 74

    from 2 to 1: R1=ctx() R7_w=scalar(...) R10=fp0
    1: R1=ctx() R7_w=scalar(...) R10=fp0
    1: (07) r7 += 447767737
    is_state_visited: suppressing checkpoint at 2, 3 jmps processed, cur->jmp_history_cnt is 75
    2: R7_w=scalar(...)
    2: (45) if r7 & 0x702000 goto pc-2
    ... mark_precise 152 steps for r7 ...
    2: R7_w=scalar(...)
    is_state_visited: suppressing checkpoint at 1, 4 jmps processed, cur->jmp_history_cnt is 75
    1: (07) r7 += 447767737
    is_state_visited: suppressing checkpoint at 2, 4 jmps processed, cur->jmp_history_cnt is 76
    2: R7_w=scalar(...)
    2: (45) if r7 & 0x702000 goto pc-2
    ...
    BPF program is too large. Processed 257 insn

The log output shows that checkpoint at label (1) is never created,
because it is suppressed by `skip_inf_loop_check` logic:
a. When 'if' at (2) is processed it pushes a state with insn_idx (1)
   onto stack and proceeds to (3);
b. At (5) checkpoint is created, and this resets
   env->{jmps,insns}_processed.
c. Verification proceeds and reaches `exit`;
d. State saved at step (a) is popped from stack and is_state_visited()
   considers if checkpoint needs to be added, but because
   env->{jmps,insns}_processed had been just reset at step (b)
   the `skip_inf_loop_check` logic forces `add_new_state` to false.
e. Verifier proceeds with current state, which slowly accumulates
   more and more entries in the jump history.

The accumulation of entries in the jump history is a problem because
of two factors:
- it eventually exhausts memory available for kmalloc() allocation;
- mark_chain_precision() traverses the jump history of a state,
  meaning that if `r7` is marked precise, verifier would iterate
  ever growing jump history until parent state boundary is reached.

(note: the log also shows a REG INVARIANTS VIOLATION warning
       upon jset processing, but that's another bug to fix).

With this patch applied, the example above is rejected by verifier
under 1s of time, reaching 1M instructions limit.

The program is a simplified reproducer from syzbot report [1].
Previous discussion could be found at [2].
The patch does not cause any changes in verification performance,
when tested on selftests from veristat.cfg and cilium programs taken
from [3].

[1] https://lore.kernel.org/bpf/670429f6.050a0220.49194.0517.GAE@google.com/
[2] https://lore.kernel.org/bpf/20241009021254.2805446-1-eddyz87@gmail.com/
[3] https://github.com/anakryiko/cilium

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 kernel/bpf/verifier.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f514247ba8ba..f64c831a9278 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17873,13 +17873,23 @@ static bool iter_active_depths_differ(struct bpf_verifier_state *old, struct bpf
 	return false;
 }
 
+#define MAX_JMPS_PER_STATE 20
+
 static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 {
 	struct bpf_verifier_state_list *new_sl;
 	struct bpf_verifier_state_list *sl, **pprev;
 	struct bpf_verifier_state *cur = env->cur_state, *new, *loop_entry;
 	int i, j, n, err, states_cnt = 0;
-	bool force_new_state = env->test_state_freq || is_force_checkpoint(env, insn_idx);
+	bool force_new_state = env->test_state_freq || is_force_checkpoint(env, insn_idx) ||
+			       /* - Long jmp history hinders mark_chain_precision performance,
+				*   so force new state if jmp history of current state exceeds
+				*   a threshold.
+				* - Jmp history records not only jumps, but also stack access,
+				*   so keep this constant 2x times the limit imposed on
+				*   env->jmps_processed for loop cases (see skip_inf_loop_check).
+				*/
+			       cur->jmp_history_cnt > MAX_JMPS_PER_STATE * 2;
 	bool add_new_state = force_new_state;
 	bool force_exact;
 
@@ -18023,7 +18033,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 			 */
 skip_inf_loop_check:
 			if (!force_new_state &&
-			    env->jmps_processed - env->prev_jmps_processed < 20 &&
+			    env->jmps_processed - env->prev_jmps_processed < MAX_JMPS_PER_STATE &&
 			    env->insn_processed - env->prev_insn_processed < 100)
 				add_new_state = false;
 			goto miss;
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread
* [PATCH bpf-next v1 1/2] bpf: force checkpoints at loop back-edges
@ 2024-10-09  2:12 Eduard Zingerman
  2024-10-09  2:12 ` [PATCH bpf-next v1 2/2] selftests/bpf: test with a very short loop Eduard Zingerman
  0 siblings, 1 reply; 16+ messages in thread
From: Eduard Zingerman @ 2024-10-09  2:12 UTC (permalink / raw)
  To: bpf, ast
  Cc: andrii, daniel, martin.lau, kernel-team, yonghong.song,
	Eduard Zingerman

In [1] syzbot reported an interesting BPF program.
Verification for this program takes a very long time.

[1] https://lore.kernel.org/bpf/670429f6.050a0220.49194.0517.GAE@google.com/

The program could be simplified to the following snippet:

    /* Program type is kprobe */
       r7 = *(u16 *)(r1 +0);
    1: r7 += 0x1ab064b9;
       if r7 & 0x702000 goto 1b;
       r7 &= 0x1ee60e;
       r7 += r1;
       if r7 s> 0x37d2 goto +0;
       r0 = 0;
       exit;

The snippet exhibits the following behaviour depending on
BPF_COMPLEXITY_LIMIT_INSNS:
- at 1,000,000 verification does not finish in 15 minutes;
- at 100,000 verification finishes in 15 seconds;
- at 100 it is possible to get some verifier log.

Perf report collected for verification of the snippet:

    99.34%        veristat
       99.34%        [kernel.kallsyms]
          71.25%        [k] __mark_chain_precision
          24.81%        [k] bt_sync_linked_regs
          ...

Fragment of the log collected with instruction limit set to 100:

    1: (07) r7 += 447767737               ; R7_w=scalar(...)
    2: (45) if r7 & 0x702000 goto pc-2
    mark_precise: frame0: last_idx 2 first_idx 2 subseq_idx -1
    mark_precise: frame0: regs=r7 stack= before 1: (07) r7 += 447767737
    mark_precise: frame0: regs=r7 stack= before 2: (45) if r7 & 0x702000 goto pc-2
      ... repeats 25 times ...
    mark_precise: frame0: regs=r7 stack= before 1: (07) r7 += 447767737
    mark_precise: frame0: regs=r7 stack= before 2: (45) if r7 & 0x702000 goto pc-2
    mark_precise: frame0: parent state regs= stack=:  R1_r=ctx() R7_rw=Pscalar(...) R10=fp0
    2: R7_w=scalar(...)
    1: (07) r7 += 447767737

This shows very long backtracking walks done by mark_chain_precision()
inside of the very small loop. Such backtracking walks terminate at
checkpoint state boundaries, but checkpoints are introduced too
infrequently for this loop.

This patch forcibly enables checkpoints for each loop back-edge.
This helps with the programs in question, as verification of both
syzbot program and reduced snippet finishes in ~2.5 sec.

However, this comes with some cost. Here are veristat results for
selftests listed in veristat.cfg and cillium BPF programs from [2].

[2] https://github.com/anakryiko/cilium

$ ./veristat -e file,prog,duration,insns,states -f 'states_pct>5' \
  -C master-baseline.log this-patch.log
File                        Program                               Duration (us) (DIFF)  Insns     (DIFF)  States       (DIFF)
--------------------------  ------------------------------------  --------------------  ----------------  -------------------
loop1.bpf.o                 nested_loops                              +52014 (+40.34%)       +0 (+0.00%)    +39649 (+720.37%)
loop2.bpf.o                 while_true                                +1308 (+161.48%)  +2861 (+160.46%)      +316 (+554.39%)
loop3.bpf.o                 while_true                              +350359 (+106.88%)       +0 (+0.00%)  +101448 (+1049.86%)
loop4.bpf.o                 combinations                                 -88 (-34.78%)    -221 (-42.18%)        +13 (+72.22%)
loop5.bpf.o                 while_true                                    +5 (+10.64%)       +3 (+3.57%)         +2 (+22.22%)
loop6.bpf.o                 trace_virtqueue_add_sgs                    -1886 (-34.01%)   -3717 (-37.38%)         +18 (+8.61%)
netif_receive_skb.bpf.o     trace_netif_receive_skb                        +5 (+0.04%)       +0 (+0.00%)      +562 (+111.29%)
profiler2.bpf.o             kprobe__vfs_link                           -1132 (-12.50%)    -1142 (-8.41%)       +158 (+53.20%)
profiler2.bpf.o             kprobe__vfs_symlink                        -1059 (-17.39%)   -1173 (-13.63%)        +77 (+37.75%)
profiler2.bpf.o             kprobe_ret__do_filp_open                     -465 (-8.40%)     -832 (-9.91%)        +84 (+40.00%)
profiler2.bpf.o             raw_tracepoint__sched_process_exit          -524 (-29.91%)   -1050 (-33.89%)          -5 (-6.02%)
profiler2.bpf.o             tracepoint__syscalls__sys_enter_kill       -4977 (-22.00%)   -4585 (-18.96%)         +70 (+6.75%)
pyperf600.bpf.o             on_event                                   +16312 (+0.81%)   -53870 (-9.81%)      -3000 (-10.16%)
strobemeta_nounroll1.bpf.o  on_event                                  -16609 (-40.28%)  -20944 (-38.87%)       -294 (-18.62%)
strobemeta_nounroll2.bpf.o  on_event                                  -22022 (-23.60%)  -40646 (-34.23%)       -578 (-15.41%)
strobemeta_subprogs.bpf.o   on_event                                  -17414 (-33.30%)  -19758 (-36.47%)       -185 (-12.33%)
test_sysctl_loop1.bpf.o     sysctl_tcp_mem                                +76 (+8.05%)      +60 (+4.33%)       +71 (+273.08%)
test_sysctl_loop2.bpf.o     sysctl_tcp_mem                              +156 (+13.38%)      +77 (+5.13%)       +82 (+282.76%)
xdp_synproxy_kern.bpf.o     syncookie_tc                               +8711 (+52.72%)   +3610 (+26.23%)       +209 (+53.87%)
xdp_synproxy_kern.bpf.o     syncookie_xdp                              +8571 (+50.01%)   +3610 (+26.10%)       +209 (+53.73%)

The test case 'state_loop_first_last_equal' needs an update because
checkpoints placement for it changed. (There is now a forced
checkpoint at 'l0'). The goal of the test is to check that there is a
jump history entry with the same last_idx/first_idx pair.
Update the test by moving 'if' instruction to 'l0',
so that total number of checkpoints in the test remains the same.
(But now interesting last_idx/first_idx pair is 1/1 instead of 4/4).

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 kernel/bpf/verifier.c                         |  4 +++
 .../selftests/bpf/progs/verifier_precision.c  | 25 +++++++++++--------
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7d9b38ffd220..3fa517b3fb57 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -16055,9 +16055,11 @@ static int push_insn(int t, int w, int e, struct bpf_verifier_env *env)
 	int *insn_state = env->cfg.insn_state;
 
 	if (e == FALLTHROUGH && insn_state[t] >= (DISCOVERED | FALLTHROUGH))
+		/* FALLTHROUGH from t already processed */
 		return DONE_EXPLORING;
 
 	if (e == BRANCH && insn_state[t] >= (DISCOVERED | BRANCH))
+		/* BRANCH from t already processed */
 		return DONE_EXPLORING;
 
 	if (w < 0 || w >= env->prog->len) {
@@ -16081,6 +16083,8 @@ static int push_insn(int t, int w, int e, struct bpf_verifier_env *env)
 		insn_stack[env->cfg.cur_stack++] = w;
 		return KEEP_EXPLORING;
 	} else if ((insn_state[w] & 0xF0) == DISCOVERED) {
+		/* back-edge */
+		mark_force_checkpoint(env, w);
 		if (env->bpf_capable)
 			return DONE_EXPLORING;
 		verbose_linfo(env, t, "%d: ", t);
diff --git a/tools/testing/selftests/bpf/progs/verifier_precision.c b/tools/testing/selftests/bpf/progs/verifier_precision.c
index 6b564d4c0986..7215c0cc0ccb 100644
--- a/tools/testing/selftests/bpf/progs/verifier_precision.c
+++ b/tools/testing/selftests/bpf/progs/verifier_precision.c
@@ -95,34 +95,37 @@ __naked int bpf_end_bswap(void)
 SEC("?raw_tp")
 __success __log_level(2)
 /*
- * Without the bug fix there will be no history between "last_idx 3 first_idx 3"
+ * Without the bug fix there will be no history between "last_idx 1 first_idx 1"
  * and "parent state regs=" lines. "R0_w=6" parts are here to help anchor
  * expected log messages to the one specific mark_chain_precision operation.
  *
  * This is quite fragile: if verifier checkpointing heuristic changes, this
  * might need adjusting.
  */
-__msg("2: (07) r0 += 1                       ; R0_w=6")
-__msg("3: (35) if r0 >= 0xa goto pc+1")
-__msg("mark_precise: frame0: last_idx 3 first_idx 3 subseq_idx -1")
-__msg("mark_precise: frame0: regs=r0 stack= before 2: (07) r0 += 1")
-__msg("mark_precise: frame0: regs=r0 stack= before 1: (07) r0 += 1")
+__msg("3: (07) r0 += 1                       ; R0_w=6")
+__msg("4: (05) goto pc-4")
+__msg("1: (35) if r0 >= 0xa goto pc+3")
+__msg("mark_precise: frame0: last_idx 1 first_idx 1 subseq_idx -1")
+__msg("mark_precise: frame0: parent state regs=r0 stack=:  R0_rw=P6 R1=ctx() R10=fp0")
+__msg("mark_precise: frame0: last_idx 4 first_idx 1 subseq_idx 1")
 __msg("mark_precise: frame0: regs=r0 stack= before 4: (05) goto pc-4")
-__msg("mark_precise: frame0: regs=r0 stack= before 3: (35) if r0 >= 0xa goto pc+1")
-__msg("mark_precise: frame0: parent state regs= stack=:  R0_rw=P4")
-__msg("3: R0_w=6")
+__msg("mark_precise: frame0: regs=r0 stack= before 3: (07) r0 += 1")
+__msg("mark_precise: frame0: regs=r0 stack= before 2: (07) r0 += 1")
+__msg("mark_precise: frame0: regs=r0 stack= before 1: (35) if r0 >= 0xa goto pc+3")
+__msg("mark_precise: frame0: parent state regs= stack=:  R0_rw=P4 R1=ctx() R10=fp0")
+__msg("1: R0=6")
 __naked int state_loop_first_last_equal(void)
 {
 	asm volatile (
 		"r0 = 0;"
 	"l0_%=:"
-		"r0 += 1;"
-		"r0 += 1;"
 		/* every few iterations we'll have a checkpoint here with
 		 * first_idx == last_idx, potentially confusing precision
 		 * backtracking logic
 		 */
 		"if r0 >= 10 goto l1_%=;"	/* checkpoint + mark_precise */
+		"r0 += 1;"
+		"r0 += 1;"
 		"goto l0_%=;"
 	"l1_%=:"
 		"exit;"
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-10-23 17:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-18  2:03 [PATCH bpf-next v1 1/2] bpf: force checkpoint when jmp history is too long Eduard Zingerman
2024-10-18  2:03 ` [PATCH bpf-next v1 2/2] selftests/bpf: test with a very short loop Eduard Zingerman
2024-10-18 11:05   ` Daniel Borkmann
2024-10-18 11:03 ` [PATCH bpf-next v1 1/2] bpf: force checkpoint when jmp history is too long Daniel Borkmann
2024-10-18 16:47   ` Eduard Zingerman
2024-10-21  7:53     ` Daniel Borkmann
2024-10-21 20:23 ` Andrii Nakryiko
2024-10-22  2:03   ` Alexei Starovoitov
2024-10-22  3:19     ` Andrii Nakryiko
2024-10-22  2:18 ` Alexei Starovoitov
2024-10-22  2:27   ` Eduard Zingerman
2024-10-22  2:53     ` Alexei Starovoitov
2024-10-22  5:38       ` Eduard Zingerman
2024-10-23  2:52         ` Eduard Zingerman
2024-10-23 17:31           ` Andrii Nakryiko
  -- strict thread matches above, loose matches on Subject: below --
2024-10-09  2:12 [PATCH bpf-next v1 1/2] bpf: force checkpoints at loop back-edges Eduard Zingerman
2024-10-09  2:12 ` [PATCH bpf-next v1 2/2] selftests/bpf: test with a very short loop Eduard Zingerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox