public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills
@ 2023-10-31  5:03 Andrii Nakryiko
  2023-10-31  5:03 ` [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states Andrii Nakryiko
                   ` (6 more replies)
  0 siblings, 7 replies; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31  5:03 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Add support to BPF verifier to track and support register spill/fill to/from
stack regardless if it was done through read-only R10 register (which is the
only form supported today), or through a general register after copying R10
into it, while also potentially modifying offset.

Once we add register this generic spill/fill support to precision
backtracking, we can take advantage of it to stop doing eager STACK_ZERO
conversion on register spill. Instead we can rely on (im)precision of spilled
const zero register to improve verifier state pruning efficiency. This
situation of using const zero register to initialize stack slots is very
common with __builtin_memset() usage or just zero-initializing variables on
the stack, and it causes unnecessary state duplication, as that STACK_ZERO
knowledge is often not necessary for correctness, as those zero values are
never used in precise context. Thus, relying on register imprecision helps
tremendously, especially in real-world BPF programs.

To make spilled const zero register behave completely equivalently to
STACK_ZERO, we need to improve few other small pieces, which is done in the
second part of the patch set. See individual patches for details. There are
also two small bug fixes spotted during STACK_ZERO debugging.

Andrii Nakryiko (7):
  bpf: use common jump (instruction) history across all states
  bpf: support non-r10 register spill/fill to/from stack in precision
    tracking
  bpf: enforce precision for r0 on callback return
  bpf: fix check for attempt to corrupt spilled pointer
  bpf: preserve STACK_ZERO slots on partial reg spills
  bpf: preserve constant zero when doing partial register restore
  bpf: track aligned STACK_ZERO cases as imprecise spilled registers

 include/linux/bpf_verifier.h                  |  34 ++-
 kernel/bpf/verifier.c                         | 274 ++++++++++--------
 .../bpf/progs/verifier_subprog_precision.c    |  83 +++++-
 .../testing/selftests/bpf/verifier/precise.c  |  38 ++-
 4 files changed, 285 insertions(+), 144 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
  2023-10-31  5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
@ 2023-10-31  5:03 ` Andrii Nakryiko
  2023-11-09 15:20   ` Eduard Zingerman
  2023-10-31  5:03 ` [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking Andrii Nakryiko
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31  5:03 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Instead of allocating and copying jump history each time we enqueue
child verifier state, switch to a model where we use one common
dynamically sized array of instruction jumps across all states.

The key observation for proving this is correct is that jmp_history is
only relevant while state is active, which means it either is a current
state (and thus we are actively modifying jump history and no other
state can interfere with us) or we are checkpointed state with some
children still active (either enqueued or being current).

In the latter case our portion of jump history is finalized and won't
change or grow, so as long as we keep it immutable until the state is
finalized, we are good.

Now, when state is finalized and is put into state hash for potentially
future pruning lookups, jump history is not used anymore. This is
because jump history is only used by precision marking logic, and we
never modify precision markings for finalized states.

So, instead of each state having its own small jump history, we keep
a global dynamically-sized jump history, where each state in current DFS
path from root to active state remembers its portion of jump history.
Current state can append to this history, but cannot modify any of its
parent histories.

Because the jmp_history array can be grown through realloc, states don't
keep pointers, they instead maintain two indexes [start, end) into
global jump history array. End is exclusive index, so start == end means
there is no relevant jump history.

This should eliminate a lot of allocations and minimize overall memory
usage (but I haven't benchmarked on real hardware, and QEMU benchmarking
is too noisy).

Also, in the next patch we'll extend jump history to maintain additional
markings for some instructions even if there was no jump, so in
preparation for that call this thing a more generic "instruction history".

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf_verifier.h |  8 +++--
 kernel/bpf/verifier.c        | 68 ++++++++++++++++--------------------
 2 files changed, 35 insertions(+), 41 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 24213a99cc79..b57696145111 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -309,7 +309,7 @@ struct bpf_func_state {
 	struct bpf_stack_state *stack;
 };
 
-struct bpf_idx_pair {
+struct bpf_insn_hist_entry {
 	u32 prev_idx;
 	u32 idx;
 };
@@ -397,8 +397,8 @@ struct bpf_verifier_state {
 	 * For most states jmp_history_cnt is [0-3].
 	 * For loops can go up to ~40.
 	 */
-	struct bpf_idx_pair *jmp_history;
-	u32 jmp_history_cnt;
+	u32 insn_hist_start;
+	u32 insn_hist_end;
 	u32 dfs_depth;
 };
 
@@ -666,6 +666,8 @@ struct bpf_verifier_env {
 	 * e.g., in reg_type_str() to generate reg_type string
 	 */
 	char tmp_str_buf[TMP_STR_BUF_LEN];
+	struct bpf_insn_hist_entry *insn_hist;
+	u32 insn_hist_cap;
 };
 
 __printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 857d76694517..2905ce2e8b34 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1737,13 +1737,6 @@ static void free_func_state(struct bpf_func_state *state)
 	kfree(state);
 }
 
-static void clear_jmp_history(struct bpf_verifier_state *state)
-{
-	kfree(state->jmp_history);
-	state->jmp_history = NULL;
-	state->jmp_history_cnt = 0;
-}
-
 static void free_verifier_state(struct bpf_verifier_state *state,
 				bool free_self)
 {
@@ -1753,7 +1746,6 @@ static void free_verifier_state(struct bpf_verifier_state *state,
 		free_func_state(state->frame[i]);
 		state->frame[i] = NULL;
 	}
-	clear_jmp_history(state);
 	if (free_self)
 		kfree(state);
 }
@@ -1779,13 +1771,6 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	struct bpf_func_state *dst;
 	int i, err;
 
-	dst_state->jmp_history = copy_array(dst_state->jmp_history, src->jmp_history,
-					    src->jmp_history_cnt, sizeof(struct bpf_idx_pair),
-					    GFP_USER);
-	if (!dst_state->jmp_history)
-		return -ENOMEM;
-	dst_state->jmp_history_cnt = src->jmp_history_cnt;
-
 	/* if dst has more stack frames then src frame, free them, this is also
 	 * necessary in case of exceptional exits using bpf_throw.
 	 */
@@ -1802,6 +1787,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	dst_state->parent = src->parent;
 	dst_state->first_insn_idx = src->first_insn_idx;
 	dst_state->last_insn_idx = src->last_insn_idx;
+	dst_state->insn_hist_start = src->insn_hist_start;
+	dst_state->insn_hist_end = src->insn_hist_end;
 	dst_state->dfs_depth = src->dfs_depth;
 	dst_state->used_as_loop_entry = src->used_as_loop_entry;
 	for (i = 0; i <= src->curframe; i++) {
@@ -3495,40 +3482,44 @@ static bool is_jmp_point(struct bpf_verifier_env *env, int insn_idx)
 static int push_jmp_history(struct bpf_verifier_env *env,
 			    struct bpf_verifier_state *cur)
 {
-	u32 cnt = cur->jmp_history_cnt;
-	struct bpf_idx_pair *p;
+	struct bpf_insn_hist_entry *p;
 	size_t alloc_size;
 
 	if (!is_jmp_point(env, env->insn_idx))
 		return 0;
 
-	cnt++;
-	alloc_size = kmalloc_size_roundup(size_mul(cnt, sizeof(*p)));
-	p = krealloc(cur->jmp_history, alloc_size, GFP_USER);
-	if (!p)
-		return -ENOMEM;
-	p[cnt - 1].idx = env->insn_idx;
-	p[cnt - 1].prev_idx = env->prev_insn_idx;
-	cur->jmp_history = p;
-	cur->jmp_history_cnt = cnt;
+	if (cur->insn_hist_end + 1 > env->insn_hist_cap) {
+		alloc_size = size_mul(cur->insn_hist_end + 1, sizeof(*p));
+		alloc_size = kmalloc_size_roundup(alloc_size);
+		p = krealloc(env->insn_hist, alloc_size, GFP_USER);
+		if (!p)
+			return -ENOMEM;
+		env->insn_hist = p;
+		env->insn_hist_cap = alloc_size / sizeof(*p);
+	}
+
+	p = &env->insn_hist[cur->insn_hist_end];
+	p->idx = env->insn_idx;
+	p->prev_idx = env->prev_insn_idx;
+	cur->insn_hist_end++;
 	return 0;
 }
 
 /* Backtrack one insn at a time. If idx is not at the top of recorded
  * history then previous instruction came from straight line execution.
  */
-static int get_prev_insn_idx(struct bpf_verifier_state *st, int i,
-			     u32 *history)
+static int get_prev_insn_idx(const struct bpf_verifier_env *env, int insn_idx,
+			     u32 hist_start, u32 *hist_endp)
 {
-	u32 cnt = *history;
+	u32 hist_end = *hist_endp;
 
-	if (cnt && st->jmp_history[cnt - 1].idx == i) {
-		i = st->jmp_history[cnt - 1].prev_idx;
-		(*history)--;
+	if (hist_end > hist_start && env->insn_hist[hist_end - 1].idx == insn_idx) {
+		insn_idx = env->insn_hist[hist_end - 1].prev_idx;
+		(*hist_endp)--;
 	} else {
-		i--;
+		insn_idx--;
 	}
-	return i;
+	return insn_idx;
 }
 
 static const char *disasm_kfunc_name(void *data, const struct bpf_insn *insn)
@@ -4200,7 +4191,7 @@ static int mark_precise_scalar_ids(struct bpf_verifier_env *env, struct bpf_veri
  * SCALARS, as well as any other registers and slots that contribute to
  * a tracked state of given registers/stack slots, depending on specific BPF
  * assembly instructions (see backtrack_insns() for exact instruction handling
- * logic). This backtracking relies on recorded jmp_history and is able to
+ * logic). This backtracking relies on recorded insn_history and is able to
  * traverse entire chain of parent states. This process ends only when all the
  * necessary registers/slots and their transitive dependencies are marked as
  * precise.
@@ -4317,7 +4308,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 
 	for (;;) {
 		DECLARE_BITMAP(mask, 64);
-		u32 history = st->jmp_history_cnt;
+		u32 hist_end = st->insn_hist_end;
 
 		if (env->log.level & BPF_LOG_LEVEL2) {
 			verbose(env, "mark_precise: frame%d: last_idx %d first_idx %d subseq_idx %d \n",
@@ -4399,7 +4390,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 			if (i == first_idx)
 				break;
 			subseq_idx = i;
-			i = get_prev_insn_idx(st, i, &history);
+			i = get_prev_insn_idx(env, i, st->insn_hist_start, &hist_end);
 			if (i >= env->prog->len) {
 				/* This can happen if backtracking reached insn 0
 				 * and there are still reg_mask or stack_mask
@@ -17109,8 +17100,8 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 
 	cur->parent = new;
 	cur->first_insn_idx = insn_idx;
+	cur->insn_hist_start = cur->insn_hist_end;
 	cur->dfs_depth = new->dfs_depth + 1;
-	clear_jmp_history(cur);
 	new_sl->next = *explored_state(env, insn_idx);
 	*explored_state(env, insn_idx) = new_sl;
 	/* connect new state to parentage chain. Current frame needs all
@@ -20807,6 +20798,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	if (!is_priv)
 		mutex_unlock(&bpf_verifier_lock);
 	vfree(env->insn_aux_data);
+	kvfree(env->insn_hist);
 err_free_env:
 	kfree(env);
 	return ret;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking
  2023-10-31  5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
  2023-10-31  5:03 ` [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states Andrii Nakryiko
@ 2023-10-31  5:03 ` Andrii Nakryiko
  2023-11-09 15:20   ` Eduard Zingerman
  2023-10-31  5:03 ` [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return Andrii Nakryiko
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31  5:03 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Tao Lyu

Use newly optimized instruction history to record instructions that
performed register spill/fill to/from stack, regardless if this was done
through read-only r10 register, or any other register after copying r10
into it *and* potentially adjusting offset.

To make this work reliably, we push extra per-instruction flags into
instruction history, encoding stack slot index (spi) and stack frame
number in extra 10 bit flags we take away from prev_idx in instruction
history. We don't touch idx field for maximum performance, as it's
checked most frequently during backtracking.

This change removes basically the last remaining practical limitation of
precision backtracking logic in BPF verifier. It fixes known
deficiencies, but also opens up new opportunities to reduce number of
verified states, explored in the next patch.

There are only three differences in selftests' BPF object files
according to veristat, all in the positive direction (less states).

File                                    Program        Insns (A)  Insns (B)  Insns  (DIFF)  States (A)  States (B)  States (DIFF)
--------------------------------------  -------------  ---------  ---------  -------------  ----------  ----------  -------------
test_cls_redirect_dynptr.bpf.linked3.o  cls_redirect        2987       2864  -123 (-4.12%)         240         231    -9 (-3.75%)
xdp_synproxy_kern.bpf.linked3.o         syncookie_tc       82848      82661  -187 (-0.23%)        5107        5073   -34 (-0.67%)
xdp_synproxy_kern.bpf.linked3.o         syncookie_xdp      85116      84964  -152 (-0.18%)        5162        5130   -32 (-0.62%)

Reported-by: Tao Lyu <tao.lyu@epfl.ch>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf_verifier.h                  |  26 +++-
 kernel/bpf/verifier.c                         | 145 +++++++++---------
 .../bpf/progs/verifier_subprog_precision.c    |  83 +++++++++-
 .../testing/selftests/bpf/verifier/precise.c  |  38 +++--
 4 files changed, 197 insertions(+), 95 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index b57696145111..7940c0861198 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -309,12 +309,34 @@ struct bpf_func_state {
 	struct bpf_stack_state *stack;
 };
 
+#define MAX_CALL_FRAMES 8
+
+/* instruction history flags, used in bpf_insn_hist_entry.flags field */
+enum {
+	/* instruction references stack slot through PTR_TO_STACK register;
+	 * we also store stack's frame number in lower 3 bits (MAX_CALL_FRAMES is 8)
+	 * and accessed stack slot's index in next 6 bits (MAX_BPF_STACK is 512,
+	 * 8 bytes per slot, so slot index (spi) is [0, 63])
+	 */
+	INSN_F_FRAMENO_MASK = 0x7, /* 3 bits */
+
+	INSN_F_SPI_MASK = 0x3f, /* 6 bits */
+	INSN_F_SPI_SHIFT = 3, /* shifted 3 bits to the left */
+
+	INSN_F_STACK_ACCESS = BIT(9), /* we need 10 bits total */
+};
+
+static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES);
+static_assert(INSN_F_SPI_MASK + 1 >= MAX_BPF_STACK / 8);
+
 struct bpf_insn_hist_entry {
-	u32 prev_idx;
 	u32 idx;
+	/* insn idx can't be bigger than 1 million */
+	u32 prev_idx : 22;
+	/* special flags, e.g., whether insn is doing register stack spill/load */
+	u32 flags : 10;
 };
 
-#define MAX_CALL_FRAMES 8
 /* Maximum number of register states that can exist at once */
 #define BPF_ID_MAP_SIZE ((MAX_BPF_REG + MAX_BPF_STACK / BPF_REG_SIZE) * MAX_CALL_FRAMES)
 struct bpf_verifier_state {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2905ce2e8b34..fbb779583d52 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3479,14 +3479,20 @@ static bool is_jmp_point(struct bpf_verifier_env *env, int insn_idx)
 }
 
 /* for any branch, call, exit record the history of jmps in the given state */
-static int push_jmp_history(struct bpf_verifier_env *env,
-			    struct bpf_verifier_state *cur)
+static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
+			     int insn_flags)
 {
 	struct bpf_insn_hist_entry *p;
 	size_t alloc_size;
 
-	if (!is_jmp_point(env, env->insn_idx))
+	/* combine instruction flags if we already recorded this instruction */
+	if (cur->insn_hist_end > cur->insn_hist_start &&
+	    (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
+	    p->idx == env->insn_idx &&
+	    p->prev_idx == env->prev_insn_idx) {
+		p->flags |= insn_flags;
 		return 0;
+	}
 
 	if (cur->insn_hist_end + 1 > env->insn_hist_cap) {
 		alloc_size = size_mul(cur->insn_hist_end + 1, sizeof(*p));
@@ -3501,14 +3507,23 @@ static int push_jmp_history(struct bpf_verifier_env *env,
 	p = &env->insn_hist[cur->insn_hist_end];
 	p->idx = env->insn_idx;
 	p->prev_idx = env->prev_insn_idx;
+	p->flags = insn_flags;
 	cur->insn_hist_end++;
 	return 0;
 }
 
+static struct bpf_insn_hist_entry *get_hist_insn_entry(struct bpf_verifier_env *env,
+						       u32 hist_start, u32 hist_end, int insn_idx)
+{
+	if (hist_end > hist_start && env->insn_hist[hist_end - 1].idx == insn_idx)
+		return &env->insn_hist[hist_end - 1];
+	return NULL;
+}
+
 /* Backtrack one insn at a time. If idx is not at the top of recorded
  * history then previous instruction came from straight line execution.
  */
-static int get_prev_insn_idx(const struct bpf_verifier_env *env, int insn_idx,
+static int get_prev_insn_idx(struct bpf_verifier_env *env, int insn_idx,
 			     u32 hist_start, u32 *hist_endp)
 {
 	u32 hist_end = *hist_endp;
@@ -3649,9 +3664,14 @@ static inline bool bt_is_reg_set(struct backtrack_state *bt, u32 reg)
 	return bt->reg_masks[bt->frame] & (1 << reg);
 }
 
+static inline bool bt_is_frame_slot_set(struct backtrack_state *bt, u32 frame, u32 slot)
+{
+	return bt->stack_masks[frame] & (1ull << slot);
+}
+
 static inline bool bt_is_slot_set(struct backtrack_state *bt, u32 slot)
 {
-	return bt->stack_masks[bt->frame] & (1ull << slot);
+	return bt_is_frame_slot_set(bt, bt->frame, slot);
 }
 
 /* format registers bitmask, e.g., "r0,r2,r4" for 0x15 mask */
@@ -3703,7 +3723,7 @@ static void fmt_stack_mask(char *buf, ssize_t buf_sz, u64 stack_mask)
  *   - *was* processed previously during backtracking.
  */
 static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
-			  struct backtrack_state *bt)
+			  struct bpf_insn_hist_entry *hist, struct backtrack_state *bt)
 {
 	const struct bpf_insn_cbs cbs = {
 		.cb_call	= disasm_kfunc_name,
@@ -3716,7 +3736,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 	u8 mode = BPF_MODE(insn->code);
 	u32 dreg = insn->dst_reg;
 	u32 sreg = insn->src_reg;
-	u32 spi, i;
+	u32 spi, i, fr;
 
 	if (insn->code == 0)
 		return 0;
@@ -3772,20 +3792,15 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 		 * by 'precise' mark in corresponding register of this state.
 		 * No further tracking necessary.
 		 */
-		if (insn->src_reg != BPF_REG_FP)
+		if (!hist || !(hist->flags & INSN_F_STACK_ACCESS))
 			return 0;
-
 		/* dreg = *(u64 *)[fp - off] was a fill from the stack.
 		 * that [fp - off] slot contains scalar that needs to be
 		 * tracked with precision
 		 */
-		spi = (-insn->off - 1) / BPF_REG_SIZE;
-		if (spi >= 64) {
-			verbose(env, "BUG spi %d\n", spi);
-			WARN_ONCE(1, "verifier backtracking bug");
-			return -EFAULT;
-		}
-		bt_set_slot(bt, spi);
+		spi = (hist->flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK;
+		fr = hist->flags & INSN_F_FRAMENO_MASK;
+		bt_set_frame_slot(bt, fr, spi);
 	} else if (class == BPF_STX || class == BPF_ST) {
 		if (bt_is_reg_set(bt, dreg))
 			/* stx & st shouldn't be using _scalar_ dst_reg
@@ -3794,17 +3809,13 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 			 */
 			return -ENOTSUPP;
 		/* scalars can only be spilled into stack */
-		if (insn->dst_reg != BPF_REG_FP)
+		if (!hist || !(hist->flags & INSN_F_STACK_ACCESS))
 			return 0;
-		spi = (-insn->off - 1) / BPF_REG_SIZE;
-		if (spi >= 64) {
-			verbose(env, "BUG spi %d\n", spi);
-			WARN_ONCE(1, "verifier backtracking bug");
-			return -EFAULT;
-		}
-		if (!bt_is_slot_set(bt, spi))
+		spi = (hist->flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK;
+		fr = hist->flags & INSN_F_FRAMENO_MASK;
+		if (!bt_is_frame_slot_set(bt, fr, spi))
 			return 0;
-		bt_clear_slot(bt, spi);
+		bt_clear_frame_slot(bt, fr, spi);
 		if (class == BPF_STX)
 			bt_set_reg(bt, sreg);
 	} else if (class == BPF_JMP || class == BPF_JMP32) {
@@ -3848,10 +3859,14 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 					WARN_ONCE(1, "verifier backtracking bug");
 					return -EFAULT;
 				}
-				/* we don't track register spills perfectly,
-				 * so fallback to force-precise instead of failing */
-				if (bt_stack_mask(bt) != 0)
-					return -ENOTSUPP;
+				/* we are now tracking register spills correctly,
+				 * so any instance of leftover slots is a bug
+				 */
+				if (bt_stack_mask(bt) != 0) {
+					verbose(env, "BUG stack slots %llx\n", bt_stack_mask(bt));
+					WARN_ONCE(1, "verifier backtracking bug (subprog leftover stack slots)");
+					return -EFAULT;
+				}
 				/* propagate r1-r5 to the caller */
 				for (i = BPF_REG_1; i <= BPF_REG_5; i++) {
 					if (bt_is_reg_set(bt, i)) {
@@ -3879,8 +3894,11 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 				WARN_ONCE(1, "verifier backtracking bug");
 				return -EFAULT;
 			}
-			if (bt_stack_mask(bt) != 0)
-				return -ENOTSUPP;
+			if (bt_stack_mask(bt) != 0) {
+				verbose(env, "BUG stack slots %llx\n", bt_stack_mask(bt));
+				WARN_ONCE(1, "verifier backtracking bug (callback leftover stack slots)");
+				return -EFAULT;
+			}
 			/* clear r1-r5 in callback subprog's mask */
 			for (i = BPF_REG_1; i <= BPF_REG_5; i++)
 				bt_clear_reg(bt, i);
@@ -4308,7 +4326,8 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 
 	for (;;) {
 		DECLARE_BITMAP(mask, 64);
-		u32 hist_end = st->insn_hist_end;
+		u32 hist_start = st->insn_hist_start, hist_end = st->insn_hist_end;
+		struct bpf_insn_hist_entry *hist;
 
 		if (env->log.level & BPF_LOG_LEVEL2) {
 			verbose(env, "mark_precise: frame%d: last_idx %d first_idx %d subseq_idx %d \n",
@@ -4372,7 +4391,8 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 				err = 0;
 				skip_first = false;
 			} else {
-				err = backtrack_insn(env, i, subseq_idx, bt);
+				hist = get_hist_insn_entry(env, hist_start, hist_end, i);
+				err = backtrack_insn(env, i, subseq_idx, hist, bt);
 			}
 			if (err == -ENOTSUPP) {
 				mark_all_scalars_precise(env, env->cur_state);
@@ -4390,7 +4410,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 			if (i == first_idx)
 				break;
 			subseq_idx = i;
-			i = get_prev_insn_idx(env, i, st->insn_hist_start, &hist_end);
+			i = get_prev_insn_idx(env, i, hist_start, &hist_end);
 			if (i >= env->prog->len) {
 				/* This can happen if backtracking reached insn 0
 				 * and there are still reg_mask or stack_mask
@@ -4425,22 +4445,10 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 			bitmap_from_u64(mask, bt_frame_stack_mask(bt, fr));
 			for_each_set_bit(i, mask, 64) {
 				if (i >= func->allocated_stack / BPF_REG_SIZE) {
-					/* the sequence of instructions:
-					 * 2: (bf) r3 = r10
-					 * 3: (7b) *(u64 *)(r3 -8) = r0
-					 * 4: (79) r4 = *(u64 *)(r10 -8)
-					 * doesn't contain jmps. It's backtracked
-					 * as a single block.
-					 * During backtracking insn 3 is not recognized as
-					 * stack access, so at the end of backtracking
-					 * stack slot fp-8 is still marked in stack_mask.
-					 * However the parent state may not have accessed
-					 * fp-8 and it's "unallocated" stack space.
-					 * In such case fallback to conservative.
-					 */
-					mark_all_scalars_precise(env, env->cur_state);
-					bt_reset(bt);
-					return 0;
+					verbose(env, "BUG backtracking (stack slot %d, total slots %d)\n",
+						i, func->allocated_stack / BPF_REG_SIZE);
+					WARN_ONCE(1, "verifier backtracking bug (stack slot out of bounds)");
+					return -EFAULT;
 				}
 
 				if (!is_spilled_scalar_reg(&func->stack[i])) {
@@ -4605,7 +4613,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 	int i, slot = -off - 1, spi = slot / BPF_REG_SIZE, err;
 	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
 	struct bpf_reg_state *reg = NULL;
-	u32 dst_reg = insn->dst_reg;
+	int insn_flags = INSN_F_STACK_ACCESS | (spi << INSN_F_SPI_SHIFT) | state->frameno;
 
 	err = grow_stack_state(state, round_up(slot + 1, BPF_REG_SIZE));
 	if (err)
@@ -4646,17 +4654,6 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 	mark_stack_slot_scratched(env, spi);
 	if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
 	    !register_is_null(reg) && env->bpf_capable) {
-		if (dst_reg != BPF_REG_FP) {
-			/* The backtracking logic can only recognize explicit
-			 * stack slot address like [fp - 8]. Other spill of
-			 * scalar via different register has to be conservative.
-			 * Backtrack from here and mark all registers as precise
-			 * that contributed into 'reg' being a constant.
-			 */
-			err = mark_chain_precision(env, value_regno);
-			if (err)
-				return err;
-		}
 		save_register_state(state, spi, reg, size);
 		/* Break the relation on a narrowing spill. */
 		if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
@@ -4668,6 +4665,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 		__mark_reg_known(&fake_reg, (u32)insn->imm);
 		fake_reg.type = SCALAR_VALUE;
 		save_register_state(state, spi, &fake_reg, size);
+		insn_flags = 0; /* not a register spill */
 	} else if (reg && is_spillable_regtype(reg->type)) {
 		/* register containing pointer is being spilled into stack */
 		if (size != BPF_REG_SIZE) {
@@ -4713,9 +4711,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 
 		/* Mark slots affected by this stack write. */
 		for (i = 0; i < size; i++)
-			state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] =
-				type;
+			state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = type;
+		insn_flags = 0; /* not a register spill */
 	}
+
+	if (insn_flags)
+		return push_insn_history(env, env->cur_state, insn_flags);
 	return 0;
 }
 
@@ -4908,6 +4909,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
 	int i, slot = -off - 1, spi = slot / BPF_REG_SIZE;
 	struct bpf_reg_state *reg;
 	u8 *stype, type;
+	int insn_flags = INSN_F_STACK_ACCESS | (spi << INSN_F_SPI_SHIFT) | reg_state->frameno;
 
 	stype = reg_state->stack[spi].slot_type;
 	reg = &reg_state->stack[spi].spilled_ptr;
@@ -4953,12 +4955,10 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
 					return -EACCES;
 				}
 				mark_reg_unknown(env, state->regs, dst_regno);
+				insn_flags = 0; /* not restoring original register state */
 			}
 			state->regs[dst_regno].live |= REG_LIVE_WRITTEN;
-			return 0;
-		}
-
-		if (dst_regno >= 0) {
+		} else if (dst_regno >= 0) {
 			/* restore register state from stack */
 			copy_register_state(&state->regs[dst_regno], reg);
 			/* mark reg as written since spilled pointer state likely
@@ -4994,7 +4994,10 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
 		mark_reg_read(env, reg, reg->parent, REG_LIVE_READ64);
 		if (dst_regno >= 0)
 			mark_reg_stack_read(env, reg_state, off, off + size, dst_regno);
+		insn_flags = 0; /* we are not restoring spilled register */
 	}
+	if (insn_flags)
+		return push_insn_history(env, env->cur_state, insn_flags);
 	return 0;
 }
 
@@ -7125,7 +7128,6 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
 			       BPF_SIZE(insn->code), BPF_WRITE, -1, true, false);
 	if (err)
 		return err;
-
 	return 0;
 }
 
@@ -17001,7 +17003,8 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 			 * the precision needs to be propagated back in
 			 * the current state.
 			 */
-			err = err ? : push_jmp_history(env, cur);
+			if (is_jmp_point(env, env->insn_idx))
+				err = err ? : push_insn_history(env, cur, 0);
 			err = err ? : propagate_precision(env, &sl->state);
 			if (err)
 				return err;
@@ -17265,7 +17268,7 @@ static int do_check(struct bpf_verifier_env *env)
 		}
 
 		if (is_jmp_point(env, env->insn_idx)) {
-			err = push_jmp_history(env, state);
+			err = push_insn_history(env, state, 0);
 			if (err)
 				return err;
 		}
diff --git a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
index db6b3143338b..88c4207c6b4c 100644
--- a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
+++ b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
@@ -487,7 +487,24 @@ __success __log_level(2)
  * so we won't be able to mark stack slot fp-8 as precise, and so will
  * fallback to forcing all as precise
  */
-__msg("mark_precise: frame0: falling back to forcing all scalars precise")
+__msg("10: (0f) r1 += r7")
+__msg("mark_precise: frame0: last_idx 10 first_idx 7 subseq_idx -1")
+__msg("mark_precise: frame0: regs=r7 stack= before 9: (bf) r1 = r8")
+__msg("mark_precise: frame0: regs=r7 stack= before 8: (27) r7 *= 4")
+__msg("mark_precise: frame0: regs=r7 stack= before 7: (79) r7 = *(u64 *)(r10 -8)")
+__msg("mark_precise: frame0: parent state regs= stack=-8:  R0_w=2 R6_w=1 R8_rw=map_value(off=0,ks=4,vs=16,imm=0) R10=fp0 fp-8_rw=P1")
+__msg("mark_precise: frame0: last_idx 18 first_idx 0 subseq_idx 7")
+__msg("mark_precise: frame0: regs= stack=-8 before 18: (95) exit")
+__msg("mark_precise: frame1: regs= stack= before 17: (0f) r0 += r2")
+__msg("mark_precise: frame1: regs= stack= before 16: (79) r2 = *(u64 *)(r1 +0)")
+__msg("mark_precise: frame1: regs= stack= before 15: (79) r0 = *(u64 *)(r10 -16)")
+__msg("mark_precise: frame1: regs= stack= before 14: (7b) *(u64 *)(r10 -16) = r2")
+__msg("mark_precise: frame1: regs= stack= before 13: (7b) *(u64 *)(r1 +0) = r2")
+__msg("mark_precise: frame1: regs=r2 stack= before 6: (85) call pc+6")
+__msg("mark_precise: frame0: regs=r2 stack= before 5: (bf) r2 = r6")
+__msg("mark_precise: frame0: regs=r6 stack= before 4: (07) r1 += -8")
+__msg("mark_precise: frame0: regs=r6 stack= before 3: (bf) r1 = r10")
+__msg("mark_precise: frame0: regs=r6 stack= before 2: (b7) r6 = 1")
 __naked int subprog_spill_into_parent_stack_slot_precise(void)
 {
 	asm volatile (
@@ -522,14 +539,68 @@ __naked int subprog_spill_into_parent_stack_slot_precise(void)
 	);
 }
 
-__naked __noinline __used
-static __u64 subprog_with_checkpoint(void)
+SEC("?raw_tp")
+__success __log_level(2)
+__msg("17: (0f) r1 += r0")
+__msg("mark_precise: frame0: last_idx 17 first_idx 0 subseq_idx -1")
+__msg("mark_precise: frame0: regs=r0 stack= before 16: (bf) r1 = r7")
+__msg("mark_precise: frame0: regs=r0 stack= before 15: (27) r0 *= 4")
+__msg("mark_precise: frame0: regs=r0 stack= before 14: (79) r0 = *(u64 *)(r10 -16)")
+__msg("mark_precise: frame0: regs= stack=-16 before 13: (7b) *(u64 *)(r7 -8) = r0")
+__msg("mark_precise: frame0: regs=r0 stack= before 12: (79) r0 = *(u64 *)(r8 +16)")
+__msg("mark_precise: frame0: regs= stack=-16 before 11: (7b) *(u64 *)(r8 +16) = r0")
+__msg("mark_precise: frame0: regs=r0 stack= before 10: (79) r0 = *(u64 *)(r7 -8)")
+__msg("mark_precise: frame0: regs= stack=-16 before 9: (7b) *(u64 *)(r10 -16) = r0")
+__msg("mark_precise: frame0: regs=r0 stack= before 8: (07) r8 += -32")
+__msg("mark_precise: frame0: regs=r0 stack= before 7: (bf) r8 = r10")
+__msg("mark_precise: frame0: regs=r0 stack= before 6: (07) r7 += -8")
+__msg("mark_precise: frame0: regs=r0 stack= before 5: (bf) r7 = r10")
+__msg("mark_precise: frame0: regs=r0 stack= before 21: (95) exit")
+__msg("mark_precise: frame1: regs=r0 stack= before 20: (bf) r0 = r1")
+__msg("mark_precise: frame1: regs=r1 stack= before 4: (85) call pc+15")
+__msg("mark_precise: frame0: regs=r1 stack= before 3: (bf) r1 = r6")
+__msg("mark_precise: frame0: regs=r6 stack= before 2: (b7) r6 = 1")
+__naked int stack_slot_aliases_precision(void)
 {
 	asm volatile (
-		"r0 = 0;"
-		/* guaranteed checkpoint if BPF_F_TEST_STATE_FREQ is used */
-		"goto +0;"
+		"r6 = 1;"
+		/* pass r6 through r1 into subprog to get it back as r0;
+		 * this whole chain will have to be marked as precise later
+		 */
+		"r1 = r6;"
+		"call identity_subprog;"
+		/* let's setup two registers that are aliased to r10 */
+		"r7 = r10;"
+		"r7 += -8;"			/* r7 = r10 - 8 */
+		"r8 = r10;"
+		"r8 += -32;"			/* r8 = r10 - 32 */
+		/* now spill subprog's return value (a r6 -> r1 -> r0 chain)
+		 * a few times through different stack pointer regs, making
+		 * sure to use r10, r7, and r8 both in LDX and STX insns, and
+		 * *importantly* also using a combination of const var_off and
+		 * insn->off to validate that we record final stack slot
+		 * correctly, instead of relying on just insn->off derivation,
+		 * which is only valid for r10-based stack offset
+		 */
+		"*(u64 *)(r10 - 16) = r0;"
+		"r0 = *(u64 *)(r7 - 8);"	/* r7 - 8 == r10 - 16 */
+		"*(u64 *)(r8 + 16) = r0;"	/* r8 + 16 = r10 - 16 */
+		"r0 = *(u64 *)(r8 + 16);"
+		"*(u64 *)(r7 - 8) = r0;"
+		"r0 = *(u64 *)(r10 - 16);"
+		/* get ready to use r0 as an index into array to force precision */
+		"r0 *= 4;"
+		"r1 = %[vals];"
+		/* here r0->r1->r6 chain is forced to be precise and has to be
+		 * propagated back to the beginning, including through the
+		 * subprog call and all the stack spills and loads
+		 */
+		"r1 += r0;"
+		"r0 = *(u32 *)(r1 + 0);"
 		"exit;"
+		:
+		: __imm_ptr(vals)
+		: __clobber_common, "r6"
 	);
 }
 
diff --git a/tools/testing/selftests/bpf/verifier/precise.c b/tools/testing/selftests/bpf/verifier/precise.c
index 0d84dd1f38b6..8a2ff81d8350 100644
--- a/tools/testing/selftests/bpf/verifier/precise.c
+++ b/tools/testing/selftests/bpf/verifier/precise.c
@@ -140,10 +140,11 @@
 	.result = REJECT,
 },
 {
-	"precise: ST insn causing spi > allocated_stack",
+	"precise: ST zero to stack insn is supported",
 	.insns = {
 	BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
 	BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 123, 0),
+	/* not a register spill, so we stop precision propagation for R4 here */
 	BPF_ST_MEM(BPF_DW, BPF_REG_3, -8, 0),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8),
 	BPF_MOV64_IMM(BPF_REG_0, -1),
@@ -157,11 +158,11 @@
 	mark_precise: frame0: last_idx 4 first_idx 2\
 	mark_precise: frame0: regs=r4 stack= before 4\
 	mark_precise: frame0: regs=r4 stack= before 3\
-	mark_precise: frame0: regs= stack=-8 before 2\
-	mark_precise: frame0: falling back to forcing all scalars precise\
-	force_precise: frame0: forcing r0 to be precise\
 	mark_precise: frame0: last_idx 5 first_idx 5\
-	mark_precise: frame0: parent state regs= stack=:",
+	mark_precise: frame0: parent state regs=r0 stack=:\
+	mark_precise: frame0: last_idx 4 first_idx 2\
+	mark_precise: frame0: regs=r0 stack= before 4\
+	5: R0=-1 R4=0",
 	.result = VERBOSE_ACCEPT,
 	.retval = -1,
 },
@@ -169,6 +170,8 @@
 	"precise: STX insn causing spi > allocated_stack",
 	.insns = {
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	/* make later reg spill more interesting by having somewhat known scalar */
+	BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 0xff),
 	BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
 	BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 123, 0),
 	BPF_STX_MEM(BPF_DW, BPF_REG_3, BPF_REG_0, -8),
@@ -179,18 +182,21 @@
 	},
 	.prog_type = BPF_PROG_TYPE_XDP,
 	.flags = BPF_F_TEST_STATE_FREQ,
-	.errstr = "mark_precise: frame0: last_idx 6 first_idx 6\
+	.errstr = "mark_precise: frame0: last_idx 7 first_idx 7\
 	mark_precise: frame0: parent state regs=r4 stack=:\
-	mark_precise: frame0: last_idx 5 first_idx 3\
-	mark_precise: frame0: regs=r4 stack= before 5\
-	mark_precise: frame0: regs=r4 stack= before 4\
-	mark_precise: frame0: regs= stack=-8 before 3\
-	mark_precise: frame0: falling back to forcing all scalars precise\
-	force_precise: frame0: forcing r0 to be precise\
-	force_precise: frame0: forcing r0 to be precise\
-	force_precise: frame0: forcing r0 to be precise\
-	force_precise: frame0: forcing r0 to be precise\
-	mark_precise: frame0: last_idx 6 first_idx 6\
+	mark_precise: frame0: last_idx 6 first_idx 4\
+	mark_precise: frame0: regs=r4 stack= before 6: (b7) r0 = -1\
+	mark_precise: frame0: regs=r4 stack= before 5: (79) r4 = *(u64 *)(r10 -8)\
+	mark_precise: frame0: regs= stack=-8 before 4: (7b) *(u64 *)(r3 -8) = r0\
+	mark_precise: frame0: parent state regs=r0 stack=:\
+	mark_precise: frame0: last_idx 3 first_idx 3\
+	mark_precise: frame0: regs=r0 stack= before 3: (55) if r3 != 0x7b goto pc+0\
+	mark_precise: frame0: regs=r0 stack= before 2: (bf) r3 = r10\
+	mark_precise: frame0: regs=r0 stack= before 1: (57) r0 &= 255\
+	mark_precise: frame0: parent state regs=r0 stack=:\
+	mark_precise: frame0: last_idx 0 first_idx 0\
+	mark_precise: frame0: regs=r0 stack= before 0: (85) call bpf_get_prandom_u32#7\
+	mark_precise: frame0: last_idx 7 first_idx 7\
 	mark_precise: frame0: parent state regs= stack=:",
 	.result = VERBOSE_ACCEPT,
 	.retval = -1,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
  2023-10-31  5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
  2023-10-31  5:03 ` [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states Andrii Nakryiko
  2023-10-31  5:03 ` [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking Andrii Nakryiko
@ 2023-10-31  5:03 ` Andrii Nakryiko
  2023-11-09 15:20   ` Eduard Zingerman
  2023-10-31  5:03 ` [PATCH bpf-next 4/7] bpf: fix check for attempt to corrupt spilled pointer Andrii Nakryiko
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31  5:03 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Given verifier checks actual value, r0 has to be precise, so we need to
propagate precision properly.

Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index fbb779583d52..098ba0e1a6ff 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9739,6 +9739,12 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 			verbose(env, "R0 not a scalar value\n");
 			return -EACCES;
 		}
+
+		/* we are going to enforce precise value, mark r0 precise */
+		err = mark_chain_precision(env, BPF_REG_0);
+		if (err)
+			return err;
+
 		if (!tnum_in(range, r0->var_off)) {
 			verbose_invalid_scalar(env, r0, &range, "callback return", "R0");
 			return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next 4/7] bpf: fix check for attempt to corrupt spilled pointer
  2023-10-31  5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2023-10-31  5:03 ` [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return Andrii Nakryiko
@ 2023-10-31  5:03 ` Andrii Nakryiko
  2023-11-09 15:20   ` Eduard Zingerman
  2023-10-31  5:03 ` [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills Andrii Nakryiko
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31  5:03 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

When register is spilled onto a stack as a 1/2/4-byte register, we set
slot_type[BPF_REG_SIZE - 1] (plus potentially few more below it,
depending on actual spill size). So to check if some stack slot has
spilled register we need to consult slot_type[7], not slot_type[0].

To avoid the need to remember and double-check this in the future, just
use is_spilled_reg() helper.

Fixes: 638f5b90d460 ("bpf: reduce verifier memory consumption")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 098ba0e1a6ff..82992c32c1bd 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4622,7 +4622,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 	 * so it's aligned access and [off, off + size) are within stack limits
 	 */
 	if (!env->allow_ptr_leaks &&
-	    state->stack[spi].slot_type[0] == STACK_SPILL &&
+	    is_spilled_reg(&state->stack[spi]) &&
 	    size != BPF_REG_SIZE) {
 		verbose(env, "attempt to corrupt spilled pointer on stack\n");
 		return -EACCES;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills
  2023-10-31  5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2023-10-31  5:03 ` [PATCH bpf-next 4/7] bpf: fix check for attempt to corrupt spilled pointer Andrii Nakryiko
@ 2023-10-31  5:03 ` Andrii Nakryiko
  2023-11-09 15:20   ` Eduard Zingerman
  2023-10-31  5:03 ` [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore Andrii Nakryiko
  2023-10-31  5:03 ` [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Andrii Nakryiko
  6 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31  5:03 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Instead of always forcing STACK_ZERO slots to STACK_MISC, preserve it in
situations where this is possible. E.g., when spilling register as
1/2/4-byte subslots on the stack, all the remaining bytes in the stack
slot do not automatically become unknown. If we knew they contained
zeroes, we can preserve those STACK_ZERO markers.

Add a helper mark_stack_slot_misc(), similar to scrub_spilled_slot(),
but that doesn't overwrite either STACK_INVALID nor STACK_ZERO. Note
that we need to take into account possibility of being in unprivileged
mode, in which case STACK_INVALID is forced to STACK_MISC for correctness,
as treating STACK_INVALID as equivalent STACK_MISC is only enabled in
privileged mode.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 82992c32c1bd..0eecc6b3109c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1355,6 +1355,21 @@ static void scrub_spilled_slot(u8 *stype)
 		*stype = STACK_MISC;
 }
 
+/* Mark stack slot as STACK_MISC, unless it is already STACK_INVALID, in which
+ * case they are equivalent, or it's STACK_ZERO, in which case we preserve
+ * more precise STACK_ZERO.
+ * Note, in uprivileged mode leaving STACK_INVALID is wrong, so we take
+ * env->allow_ptr_leaks into account and force STACK_MISC, if necessary.
+ */
+static void mark_stack_slot_misc(struct bpf_verifier_env *env, u8 *stype)
+{
+	if (*stype == STACK_ZERO)
+		return;
+	if (env->allow_ptr_leaks && *stype == STACK_INVALID)
+		return;
+	*stype = STACK_MISC;
+}
+
 static void print_scalar_ranges(struct bpf_verifier_env *env,
 				const struct bpf_reg_state *reg,
 				const char **sep)
@@ -4577,7 +4592,8 @@ static void copy_register_state(struct bpf_reg_state *dst, const struct bpf_reg_
 	dst->live = live;
 }
 
-static void save_register_state(struct bpf_func_state *state,
+static void save_register_state(struct bpf_verifier_env *env,
+				struct bpf_func_state *state,
 				int spi, struct bpf_reg_state *reg,
 				int size)
 {
@@ -4592,7 +4608,7 @@ static void save_register_state(struct bpf_func_state *state,
 
 	/* size < 8 bytes spill */
 	for (; i; i--)
-		scrub_spilled_slot(&state->stack[spi].slot_type[i - 1]);
+		mark_stack_slot_misc(env, &state->stack[spi].slot_type[i - 1]);
 }
 
 static bool is_bpf_st_mem(struct bpf_insn *insn)
@@ -4654,7 +4670,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 	mark_stack_slot_scratched(env, spi);
 	if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
 	    !register_is_null(reg) && env->bpf_capable) {
-		save_register_state(state, spi, reg, size);
+		save_register_state(env, state, spi, reg, size);
 		/* Break the relation on a narrowing spill. */
 		if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
 			state->stack[spi].spilled_ptr.id = 0;
@@ -4664,7 +4680,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 
 		__mark_reg_known(&fake_reg, (u32)insn->imm);
 		fake_reg.type = SCALAR_VALUE;
-		save_register_state(state, spi, &fake_reg, size);
+		save_register_state(env, state, spi, &fake_reg, size);
 		insn_flags = 0; /* not a register spill */
 	} else if (reg && is_spillable_regtype(reg->type)) {
 		/* register containing pointer is being spilled into stack */
@@ -4677,7 +4693,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 			verbose(env, "cannot spill pointers to stack into stack frame of the caller\n");
 			return -EINVAL;
 		}
-		save_register_state(state, spi, reg, size);
+		save_register_state(env, state, spi, reg, size);
 	} else {
 		u8 type = STACK_MISC;
 
@@ -4948,6 +4964,8 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
 						continue;
 					if (type == STACK_MISC)
 						continue;
+					if (type == STACK_ZERO)
+						continue;
 					if (type == STACK_INVALID && env->allow_uninit_stack)
 						continue;
 					verbose(env, "invalid read from stack off %d+%d size %d\n",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore
  2023-10-31  5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2023-10-31  5:03 ` [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills Andrii Nakryiko
@ 2023-10-31  5:03 ` Andrii Nakryiko
  2023-11-09 15:20   ` Eduard Zingerman
  2023-10-31  5:03 ` [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Andrii Nakryiko
  6 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31  5:03 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Similar to special handling of STACK_ZERO, when reading 1/2/4 bytes from
stack from slot that has register spilled into it and that register has
a constant value zero, preserve that zero and mark spilled register as
precise for that. This makes spilled const zero register and STACK_ZERO
cases equivalent in their behavior.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0eecc6b3109c..8cfe060e4938 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4958,22 +4958,39 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
 				copy_register_state(&state->regs[dst_regno], reg);
 				state->regs[dst_regno].subreg_def = subreg_def;
 			} else {
+				int spill_cnt = 0, zero_cnt = 0;
+
 				for (i = 0; i < size; i++) {
 					type = stype[(slot - i) % BPF_REG_SIZE];
-					if (type == STACK_SPILL)
+					if (type == STACK_SPILL) {
+						spill_cnt++;
 						continue;
+					}
 					if (type == STACK_MISC)
 						continue;
-					if (type == STACK_ZERO)
+					if (type == STACK_ZERO) {
+						zero_cnt++;
 						continue;
+					}
 					if (type == STACK_INVALID && env->allow_uninit_stack)
 						continue;
 					verbose(env, "invalid read from stack off %d+%d size %d\n",
 						off, i, size);
 					return -EACCES;
 				}
-				mark_reg_unknown(env, state->regs, dst_regno);
-				insn_flags = 0; /* not restoring original register state */
+
+				if (spill_cnt == size &&
+				    tnum_is_const(reg->var_off) && reg->var_off.value == 0) {
+					__mark_reg_const_zero(&state->regs[dst_regno]);
+					/* this IS register fill, so keep insn_flags */
+				} else if (zero_cnt == size) {
+					/* similarly to mark_reg_stack_read(), preserve zeroes */
+					__mark_reg_const_zero(&state->regs[dst_regno]);
+					insn_flags = 0; /* not restoring original register state */
+				} else {
+					mark_reg_unknown(env, state->regs, dst_regno);
+					insn_flags = 0; /* not restoring original register state */
+				}
 			}
 			state->regs[dst_regno].live |= REG_LIVE_WRITTEN;
 		} else if (dst_regno >= 0) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
  2023-10-31  5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2023-10-31  5:03 ` [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore Andrii Nakryiko
@ 2023-10-31  5:03 ` Andrii Nakryiko
  2023-10-31  5:22   ` Andrii Nakryiko
  2023-11-09 15:21   ` Eduard Zingerman
  6 siblings, 2 replies; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31  5:03 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Now that precision backtracing is supporting register spill/fill to/from
stack, there is another oportunity to be exploited here: minimizing
precise STACK_ZERO cases. With a simple code change we can rely on
initially imprecise register spill tracking for cases when register
spilled to stack was a known zero.

This is a very common case for initializing on the stack variables,
including rather large structures. Often times zero has no special
meaning for the subsequent BPF program logic and is often overwritten
with non-zero values soon afterwards. But due to STACK_ZERO vs
STACK_MISC tracking, such initial zero initialization actually causes
duplication of verifier states as STACK_ZERO is clearly different than
STACK_MISC or spilled SCALAR_VALUE register.

The effect of this (now) trivial change is huge, as can be seen below.
These are differences between BPF selftests, Cilium, and Meta-internal
BPF object files relative to previous patch in this series. You can see
improvements ranging from single-digit percentage improvement for
instructions and states, all the way to 50-60% reduction for some of
Meta-internal host agent programs, and even some Cilium programs.

For Meta-internal ones I left only the differences for largest BPF
object files by states/instructions, as there were too many differences
in the overall output. All the differences were improvements, reducting
number of states and thus instructions validated.

Note, Meta-internal BPF object file names are not printed below.
Many copies of balancer_ingress are actually many different
configurations of Katran, so they are different BPF programs, which
explains state reduction going from -16% all the way to 31%, depending
on BPF program logic complexity.

SELFTESTS
=========
File                                     Program                  Insns (A)  Insns (B)  Insns    (DIFF)  States (A)  States (B)  States (DIFF)
---------------------------------------  -----------------------  ---------  ---------  ---------------  ----------  ----------  -------------
bpf_iter_netlink.bpf.linked3.o           dump_netlink                   148        104    -44 (-29.73%)           8           5   -3 (-37.50%)
bpf_iter_unix.bpf.linked3.o              dump_unix                     8474       8404     -70 (-0.83%)         151         147    -4 (-2.65%)
bpf_loop.bpf.linked3.o                   stack_check                    560        324   -236 (-42.14%)          42          24  -18 (-42.86%)
local_storage_bench.bpf.linked3.o        get_local                      120         77    -43 (-35.83%)           9           6   -3 (-33.33%)
loop6.bpf.linked3.o                      trace_virtqueue_add_sgs      10167       9868    -299 (-2.94%)         226         206   -20 (-8.85%)
pyperf600_bpf_loop.bpf.linked3.o         on_event                      4872       3423  -1449 (-29.74%)         322         229  -93 (-28.88%)
strobemeta.bpf.linked3.o                 on_event                    180697     176036   -4661 (-2.58%)        4780        4734   -46 (-0.96%)
test_cls_redirect.bpf.linked3.o          cls_redirect                 65594      65401    -193 (-0.29%)        4230        4212   -18 (-0.43%)
test_global_func_args.bpf.linked3.o      test_cls                       145        136      -9 (-6.21%)          10           9   -1 (-10.00%)
test_l4lb.bpf.linked3.o                  balancer_ingress              4760       2612  -2148 (-45.13%)         113         102   -11 (-9.73%)
test_l4lb_noinline.bpf.linked3.o         balancer_ingress              4845       4877     +32 (+0.66%)         219         221    +2 (+0.91%)
test_l4lb_noinline_dynptr.bpf.linked3.o  balancer_ingress              2072       2087     +15 (+0.72%)          97          98    +1 (+1.03%)
test_seg6_loop.bpf.linked3.o             __add_egr_x                  12440       9975  -2465 (-19.82%)         364         353   -11 (-3.02%)
test_tcp_hdr_options.bpf.linked3.o       estab                         2558       2572     +14 (+0.55%)         179         180    +1 (+0.56%)
test_xdp_dynptr.bpf.linked3.o            _xdp_tx_iptunnel               645        596     -49 (-7.60%)          26          24    -2 (-7.69%)
test_xdp_noinline.bpf.linked3.o          balancer_ingress_v6           3520       3516      -4 (-0.11%)         216         216    +0 (+0.00%)
xdp_synproxy_kern.bpf.linked3.o          syncookie_tc                 82661      81241   -1420 (-1.72%)        5073        5155   +82 (+1.62%)
xdp_synproxy_kern.bpf.linked3.o          syncookie_xdp                84964      82297   -2667 (-3.14%)        5130        5157   +27 (+0.53%)

META-INTERNAL
=============
Program                                 Insns (A)  Insns (B)  Insns      (DIFF)  States (A)  States (B)  States   (DIFF)
--------------------------------------  ---------  ---------  -----------------  ----------  ----------  ---------------
balancer_ingress                            27925      23608    -4317 (-15.46%)        1488        1482      -6 (-0.40%)
balancer_ingress                            31824      27546    -4278 (-13.44%)        1658        1652      -6 (-0.36%)
balancer_ingress                            32213      27935    -4278 (-13.28%)        1689        1683      -6 (-0.36%)
balancer_ingress                            32213      27935    -4278 (-13.28%)        1689        1683      -6 (-0.36%)
balancer_ingress                            31824      27546    -4278 (-13.44%)        1658        1652      -6 (-0.36%)
balancer_ingress                            38647      29562    -9085 (-23.51%)        2069        1835   -234 (-11.31%)
balancer_ingress                            38647      29562    -9085 (-23.51%)        2069        1835   -234 (-11.31%)
balancer_ingress                            40339      30792    -9547 (-23.67%)        2193        1934   -259 (-11.81%)
balancer_ingress                            37321      29055    -8266 (-22.15%)        1972        1795    -177 (-8.98%)
balancer_ingress                            38176      29753    -8423 (-22.06%)        2008        1831    -177 (-8.81%)
balancer_ingress                            29193      20910    -8283 (-28.37%)        1599        1422   -177 (-11.07%)
balancer_ingress                            30013      21452    -8561 (-28.52%)        1645        1447   -198 (-12.04%)
balancer_ingress                            28691      24290    -4401 (-15.34%)        1545        1531     -14 (-0.91%)
balancer_ingress                            34223      28965    -5258 (-15.36%)        1984        1875    -109 (-5.49%)
balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
balancer_ingress                            35868      26455    -9413 (-26.24%)        2140        1827   -313 (-14.63%)
balancer_ingress                            35868      26455    -9413 (-26.24%)        2140        1827   -313 (-14.63%)
balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
balancer_ingress                            34844      29485    -5359 (-15.38%)        2036        1918    -118 (-5.80%)
fbflow_egress                                3256       2652     -604 (-18.55%)         218         192    -26 (-11.93%)
fbflow_ingress                               1026        944       -82 (-7.99%)          70          63     -7 (-10.00%)
sslwall_tc_egress                            8424       7360    -1064 (-12.63%)         498         458     -40 (-8.03%)
syar_accept_protect                         15040       9539    -5501 (-36.58%)         364         220   -144 (-39.56%)
syar_connect_tcp_v6                         15036       9535    -5501 (-36.59%)         360         216   -144 (-40.00%)
syar_connect_udp_v4                         15039       9538    -5501 (-36.58%)         361         217   -144 (-39.89%)
syar_connect_connect4_protect4              24805      15833    -8972 (-36.17%)         756         480   -276 (-36.51%)
syar_lsm_file_open                         167772     151813    -15959 (-9.51%)        1836        1667    -169 (-9.20%)
syar_namespace_create_new                   14805       9304    -5501 (-37.16%)         353         209   -144 (-40.79%)
syar_python3_detect                         17531      12030    -5501 (-31.38%)         391         247   -144 (-36.83%)
syar_ssh_post_fork                          16412      10911    -5501 (-33.52%)         405         261   -144 (-35.56%)
syar_enter_execve                           14728       9227    -5501 (-37.35%)         345         201   -144 (-41.74%)
syar_enter_execveat                         14728       9227    -5501 (-37.35%)         345         201   -144 (-41.74%)
syar_exit_execve                            16622      11121    -5501 (-33.09%)         376         232   -144 (-38.30%)
syar_exit_execveat                          16622      11121    -5501 (-33.09%)         376         232   -144 (-38.30%)
syar_syscalls_kill                          15288       9787    -5501 (-35.98%)         398         254   -144 (-36.18%)
syar_task_enter_pivot_root                  14898       9397    -5501 (-36.92%)         357         213   -144 (-40.34%)
syar_syscalls_setreuid                      16678      11177    -5501 (-32.98%)         429         285   -144 (-33.57%)
syar_syscalls_setuid                        16678      11177    -5501 (-32.98%)         429         285   -144 (-33.57%)
syar_syscalls_process_vm_readv              14959       9458    -5501 (-36.77%)         364         220   -144 (-39.56%)
syar_syscalls_process_vm_writev             15757      10256    -5501 (-34.91%)         390         246   -144 (-36.92%)
do_uprobe                                   15519      10018    -5501 (-35.45%)         373         229   -144 (-38.61%)
edgewall                                   179715      55783  -123932 (-68.96%)       12607        3999  -8608 (-68.28%)
bictcp_state                                 7570       4131    -3439 (-45.43%)         496         269   -227 (-45.77%)
cubictcp_state                               7570       4131    -3439 (-45.43%)         496         269   -227 (-45.77%)
tcp_rate_skb_delivered                        447        272     -175 (-39.15%)          29          18    -11 (-37.93%)
kprobe__bbr_set_state                        4566       2615    -1951 (-42.73%)         209         124    -85 (-40.67%)
kprobe__bictcp_state                         4566       2615    -1951 (-42.73%)         209         124    -85 (-40.67%)
inet_sock_set_state                          1501       1337     -164 (-10.93%)          93          85      -8 (-8.60%)
tcp_retransmit_skb                           1145        981     -164 (-14.32%)          67          59     -8 (-11.94%)
tcp_retransmit_synack                        1183        951     -232 (-19.61%)          67          55    -12 (-17.91%)
bpf_tcptuner                                 1459       1187     -272 (-18.64%)          99          80    -19 (-19.19%)
tw_egress                                     801        776       -25 (-3.12%)          69          66      -3 (-4.35%)
tw_ingress                                    795        770       -25 (-3.14%)          69          66      -3 (-4.35%)
ttls_tc_ingress                             19025      19383      +358 (+1.88%)         470         465      -5 (-1.06%)
ttls_nat_egress                               490        299     -191 (-38.98%)          33          20    -13 (-39.39%)
ttls_nat_ingress                              448        285     -163 (-36.38%)          32          21    -11 (-34.38%)
tw_twfw_egress                             511127     212071  -299056 (-58.51%)       16733        8504  -8229 (-49.18%)
tw_twfw_ingress                            500095     212069  -288026 (-57.59%)       16223        8504  -7719 (-47.58%)
tw_twfw_tc_eg                              511113     212064  -299049 (-58.51%)       16732        8504  -8228 (-49.18%)
tw_twfw_tc_in                              500095     212069  -288026 (-57.59%)       16223        8504  -7719 (-47.58%)
tw_twfw_egress                              12632      12435      -197 (-1.56%)         276         260     -16 (-5.80%)
tw_twfw_ingress                             12631      12454      -177 (-1.40%)         278         261     -17 (-6.12%)
tw_twfw_tc_eg                               12595      12435      -160 (-1.27%)         274         259     -15 (-5.47%)
tw_twfw_tc_in                               12631      12454      -177 (-1.40%)         278         261     -17 (-6.12%)
tw_xdp_dump                                   266        209      -57 (-21.43%)           9           8     -1 (-11.11%)

CILIUM
=========
File           Program                           Insns (A)  Insns (B)  Insns     (DIFF)  States (A)  States (B)  States  (DIFF)
-------------  --------------------------------  ---------  ---------  ----------------  ----------  ----------  --------------
bpf_host.o     cil_to_netdev                          6047       4578   -1469 (-24.29%)         362         249  -113 (-31.22%)
bpf_host.o     handle_lxc_traffic                     2227       1585    -642 (-28.83%)         156         103   -53 (-33.97%)
bpf_host.o     tail_handle_ipv4_from_netdev           2244       1458    -786 (-35.03%)         163         106   -57 (-34.97%)
bpf_host.o     tail_handle_nat_fwd_ipv4              21022      10479  -10543 (-50.15%)        1289         670  -619 (-48.02%)
bpf_host.o     tail_handle_nat_fwd_ipv6              15433      11375   -4058 (-26.29%)         905         643  -262 (-28.95%)
bpf_host.o     tail_ipv4_host_policy_ingress          2219       1367    -852 (-38.40%)         161          96   -65 (-40.37%)
bpf_host.o     tail_nodeport_nat_egress_ipv4         22460      19862   -2598 (-11.57%)        1469        1293  -176 (-11.98%)
bpf_host.o     tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
bpf_host.o     tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
bpf_host.o     tail_nodeport_nat_ipv6_egress          3702       3542     -160 (-4.32%)         215         205    -10 (-4.65%)
bpf_lxc.o      tail_handle_nat_fwd_ipv4              21022      10479  -10543 (-50.15%)        1289         670  -619 (-48.02%)
bpf_lxc.o      tail_handle_nat_fwd_ipv6              15433      11375   -4058 (-26.29%)         905         643  -262 (-28.95%)
bpf_lxc.o      tail_ipv4_ct_egress                    5073       3374   -1699 (-33.49%)         262         172   -90 (-34.35%)
bpf_lxc.o      tail_ipv4_ct_ingress                   5093       3385   -1708 (-33.54%)         262         172   -90 (-34.35%)
bpf_lxc.o      tail_ipv4_ct_ingress_policy_only       5093       3385   -1708 (-33.54%)         262         172   -90 (-34.35%)
bpf_lxc.o      tail_ipv6_ct_egress                    4593       3878    -715 (-15.57%)         194         151   -43 (-22.16%)
bpf_lxc.o      tail_ipv6_ct_ingress                   4606       3891    -715 (-15.52%)         194         151   -43 (-22.16%)
bpf_lxc.o      tail_ipv6_ct_ingress_policy_only       4606       3891    -715 (-15.52%)         194         151   -43 (-22.16%)
bpf_lxc.o      tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
bpf_lxc.o      tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
bpf_overlay.o  tail_handle_nat_fwd_ipv4              20524      10114  -10410 (-50.72%)        1271         638  -633 (-49.80%)
bpf_overlay.o  tail_nodeport_nat_egress_ipv4         22718      19490   -3228 (-14.21%)        1475        1275  -200 (-13.56%)
bpf_overlay.o  tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
bpf_overlay.o  tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
bpf_overlay.o  tail_nodeport_nat_ipv6_egress          3638       3548      -90 (-2.47%)         209         203     -6 (-2.87%)
bpf_overlay.o  tail_rev_nodeport_lb4                  4368       3820    -548 (-12.55%)         248         215   -33 (-13.31%)
bpf_overlay.o  tail_rev_nodeport_lb6                  2867       2428    -439 (-15.31%)         167         140   -27 (-16.17%)
bpf_sock.o     cil_sock6_connect                      1718       1703      -15 (-0.87%)         100          99     -1 (-1.00%)
bpf_xdp.o      tail_handle_nat_fwd_ipv4              12917      12443     -474 (-3.67%)         875         849    -26 (-2.97%)
bpf_xdp.o      tail_handle_nat_fwd_ipv6              13515      13264     -251 (-1.86%)         715         702    -13 (-1.82%)
bpf_xdp.o      tail_lb_ipv4                          39492      36367    -3125 (-7.91%)        2430        2251   -179 (-7.37%)
bpf_xdp.o      tail_lb_ipv6                          80441      78058    -2383 (-2.96%)        3647        3523   -124 (-3.40%)
bpf_xdp.o      tail_nodeport_ipv6_dsr                 1038        901    -137 (-13.20%)          61          55     -6 (-9.84%)
bpf_xdp.o      tail_nodeport_nat_egress_ipv4         13027      12096     -931 (-7.15%)         868         809    -59 (-6.80%)
bpf_xdp.o      tail_nodeport_nat_ingress_ipv4         7617       5900   -1717 (-22.54%)         522         413  -109 (-20.88%)
bpf_xdp.o      tail_nodeport_nat_ingress_ipv6         7575       7395     -180 (-2.38%)         383         374     -9 (-2.35%)
bpf_xdp.o      tail_rev_nodeport_lb4                  6808       6739      -69 (-1.01%)         403         396     -7 (-1.74%)
bpf_xdp.o      tail_rev_nodeport_lb6                 16173      15847     -326 (-2.02%)        1010         990    -20 (-1.98%)

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8cfe060e4938..e42ce974b106 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4668,8 +4668,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 		return err;
 
 	mark_stack_slot_scratched(env, spi);
-	if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
-	    !register_is_null(reg) && env->bpf_capable) {
+	if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && env->bpf_capable) {
 		save_register_state(env, state, spi, reg, size);
 		/* Break the relation on a narrowing spill. */
 		if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
@@ -4718,7 +4717,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 		/* when we zero initialize stack slots mark them as such */
 		if ((reg && register_is_null(reg)) ||
 		    (!reg && is_bpf_st_mem(insn) && insn->imm == 0)) {
-			/* backtracking doesn't work for STACK_ZERO yet. */
+			/* STACK_ZERO case happened because register spill
+			 * wasn't properly aligned at the stack slot boundary,
+			 * so it's not a register spill anymore; force
+			 * originating register to be precise to make
+			 * STACK_ZERO correct for subsequent states
+			 */
 			err = mark_chain_precision(env, value_regno);
 			if (err)
 				return err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
  2023-10-31  5:03 ` [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Andrii Nakryiko
@ 2023-10-31  5:22   ` Andrii Nakryiko
  2023-11-01  7:56     ` Jiri Olsa
  2023-11-09 15:21   ` Eduard Zingerman
  1 sibling, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31  5:22 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, martin.lau, kernel-team

On Mon, Oct 30, 2023 at 10:03 PM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> Now that precision backtracing is supporting register spill/fill to/from
> stack, there is another oportunity to be exploited here: minimizing
> precise STACK_ZERO cases. With a simple code change we can rely on
> initially imprecise register spill tracking for cases when register
> spilled to stack was a known zero.
>
> This is a very common case for initializing on the stack variables,
> including rather large structures. Often times zero has no special
> meaning for the subsequent BPF program logic and is often overwritten
> with non-zero values soon afterwards. But due to STACK_ZERO vs
> STACK_MISC tracking, such initial zero initialization actually causes
> duplication of verifier states as STACK_ZERO is clearly different than
> STACK_MISC or spilled SCALAR_VALUE register.
>
> The effect of this (now) trivial change is huge, as can be seen below.
> These are differences between BPF selftests, Cilium, and Meta-internal
> BPF object files relative to previous patch in this series. You can see
> improvements ranging from single-digit percentage improvement for
> instructions and states, all the way to 50-60% reduction for some of
> Meta-internal host agent programs, and even some Cilium programs.
>
> For Meta-internal ones I left only the differences for largest BPF
> object files by states/instructions, as there were too many differences
> in the overall output. All the differences were improvements, reducting
> number of states and thus instructions validated.
>
> Note, Meta-internal BPF object file names are not printed below.
> Many copies of balancer_ingress are actually many different
> configurations of Katran, so they are different BPF programs, which
> explains state reduction going from -16% all the way to 31%, depending
> on BPF program logic complexity.
>
> SELFTESTS
> =========
> File                                     Program                  Insns (A)  Insns (B)  Insns    (DIFF)  States (A)  States (B)  States (DIFF)
> ---------------------------------------  -----------------------  ---------  ---------  ---------------  ----------  ----------  -------------
> bpf_iter_netlink.bpf.linked3.o           dump_netlink                   148        104    -44 (-29.73%)           8           5   -3 (-37.50%)
> bpf_iter_unix.bpf.linked3.o              dump_unix                     8474       8404     -70 (-0.83%)         151         147    -4 (-2.65%)
> bpf_loop.bpf.linked3.o                   stack_check                    560        324   -236 (-42.14%)          42          24  -18 (-42.86%)
> local_storage_bench.bpf.linked3.o        get_local                      120         77    -43 (-35.83%)           9           6   -3 (-33.33%)
> loop6.bpf.linked3.o                      trace_virtqueue_add_sgs      10167       9868    -299 (-2.94%)         226         206   -20 (-8.85%)
> pyperf600_bpf_loop.bpf.linked3.o         on_event                      4872       3423  -1449 (-29.74%)         322         229  -93 (-28.88%)
> strobemeta.bpf.linked3.o                 on_event                    180697     176036   -4661 (-2.58%)        4780        4734   -46 (-0.96%)
> test_cls_redirect.bpf.linked3.o          cls_redirect                 65594      65401    -193 (-0.29%)        4230        4212   -18 (-0.43%)
> test_global_func_args.bpf.linked3.o      test_cls                       145        136      -9 (-6.21%)          10           9   -1 (-10.00%)
> test_l4lb.bpf.linked3.o                  balancer_ingress              4760       2612  -2148 (-45.13%)         113         102   -11 (-9.73%)
> test_l4lb_noinline.bpf.linked3.o         balancer_ingress              4845       4877     +32 (+0.66%)         219         221    +2 (+0.91%)
> test_l4lb_noinline_dynptr.bpf.linked3.o  balancer_ingress              2072       2087     +15 (+0.72%)          97          98    +1 (+1.03%)
> test_seg6_loop.bpf.linked3.o             __add_egr_x                  12440       9975  -2465 (-19.82%)         364         353   -11 (-3.02%)
> test_tcp_hdr_options.bpf.linked3.o       estab                         2558       2572     +14 (+0.55%)         179         180    +1 (+0.56%)
> test_xdp_dynptr.bpf.linked3.o            _xdp_tx_iptunnel               645        596     -49 (-7.60%)          26          24    -2 (-7.69%)
> test_xdp_noinline.bpf.linked3.o          balancer_ingress_v6           3520       3516      -4 (-0.11%)         216         216    +0 (+0.00%)
> xdp_synproxy_kern.bpf.linked3.o          syncookie_tc                 82661      81241   -1420 (-1.72%)        5073        5155   +82 (+1.62%)
> xdp_synproxy_kern.bpf.linked3.o          syncookie_xdp                84964      82297   -2667 (-3.14%)        5130        5157   +27 (+0.53%)
>
> META-INTERNAL
> =============
> Program                                 Insns (A)  Insns (B)  Insns      (DIFF)  States (A)  States (B)  States   (DIFF)
> --------------------------------------  ---------  ---------  -----------------  ----------  ----------  ---------------
> balancer_ingress                            27925      23608    -4317 (-15.46%)        1488        1482      -6 (-0.40%)
> balancer_ingress                            31824      27546    -4278 (-13.44%)        1658        1652      -6 (-0.36%)
> balancer_ingress                            32213      27935    -4278 (-13.28%)        1689        1683      -6 (-0.36%)
> balancer_ingress                            32213      27935    -4278 (-13.28%)        1689        1683      -6 (-0.36%)
> balancer_ingress                            31824      27546    -4278 (-13.44%)        1658        1652      -6 (-0.36%)
> balancer_ingress                            38647      29562    -9085 (-23.51%)        2069        1835   -234 (-11.31%)
> balancer_ingress                            38647      29562    -9085 (-23.51%)        2069        1835   -234 (-11.31%)
> balancer_ingress                            40339      30792    -9547 (-23.67%)        2193        1934   -259 (-11.81%)
> balancer_ingress                            37321      29055    -8266 (-22.15%)        1972        1795    -177 (-8.98%)
> balancer_ingress                            38176      29753    -8423 (-22.06%)        2008        1831    -177 (-8.81%)
> balancer_ingress                            29193      20910    -8283 (-28.37%)        1599        1422   -177 (-11.07%)
> balancer_ingress                            30013      21452    -8561 (-28.52%)        1645        1447   -198 (-12.04%)
> balancer_ingress                            28691      24290    -4401 (-15.34%)        1545        1531     -14 (-0.91%)
> balancer_ingress                            34223      28965    -5258 (-15.36%)        1984        1875    -109 (-5.49%)
> balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> balancer_ingress                            35868      26455    -9413 (-26.24%)        2140        1827   -313 (-14.63%)
> balancer_ingress                            35868      26455    -9413 (-26.24%)        2140        1827   -313 (-14.63%)
> balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> balancer_ingress                            34844      29485    -5359 (-15.38%)        2036        1918    -118 (-5.80%)
> fbflow_egress                                3256       2652     -604 (-18.55%)         218         192    -26 (-11.93%)
> fbflow_ingress                               1026        944       -82 (-7.99%)          70          63     -7 (-10.00%)
> sslwall_tc_egress                            8424       7360    -1064 (-12.63%)         498         458     -40 (-8.03%)
> syar_accept_protect                         15040       9539    -5501 (-36.58%)         364         220   -144 (-39.56%)
> syar_connect_tcp_v6                         15036       9535    -5501 (-36.59%)         360         216   -144 (-40.00%)
> syar_connect_udp_v4                         15039       9538    -5501 (-36.58%)         361         217   -144 (-39.89%)
> syar_connect_connect4_protect4              24805      15833    -8972 (-36.17%)         756         480   -276 (-36.51%)
> syar_lsm_file_open                         167772     151813    -15959 (-9.51%)        1836        1667    -169 (-9.20%)
> syar_namespace_create_new                   14805       9304    -5501 (-37.16%)         353         209   -144 (-40.79%)
> syar_python3_detect                         17531      12030    -5501 (-31.38%)         391         247   -144 (-36.83%)
> syar_ssh_post_fork                          16412      10911    -5501 (-33.52%)         405         261   -144 (-35.56%)
> syar_enter_execve                           14728       9227    -5501 (-37.35%)         345         201   -144 (-41.74%)
> syar_enter_execveat                         14728       9227    -5501 (-37.35%)         345         201   -144 (-41.74%)
> syar_exit_execve                            16622      11121    -5501 (-33.09%)         376         232   -144 (-38.30%)
> syar_exit_execveat                          16622      11121    -5501 (-33.09%)         376         232   -144 (-38.30%)
> syar_syscalls_kill                          15288       9787    -5501 (-35.98%)         398         254   -144 (-36.18%)
> syar_task_enter_pivot_root                  14898       9397    -5501 (-36.92%)         357         213   -144 (-40.34%)
> syar_syscalls_setreuid                      16678      11177    -5501 (-32.98%)         429         285   -144 (-33.57%)
> syar_syscalls_setuid                        16678      11177    -5501 (-32.98%)         429         285   -144 (-33.57%)
> syar_syscalls_process_vm_readv              14959       9458    -5501 (-36.77%)         364         220   -144 (-39.56%)
> syar_syscalls_process_vm_writev             15757      10256    -5501 (-34.91%)         390         246   -144 (-36.92%)
> do_uprobe                                   15519      10018    -5501 (-35.45%)         373         229   -144 (-38.61%)
> edgewall                                   179715      55783  -123932 (-68.96%)       12607        3999  -8608 (-68.28%)
> bictcp_state                                 7570       4131    -3439 (-45.43%)         496         269   -227 (-45.77%)
> cubictcp_state                               7570       4131    -3439 (-45.43%)         496         269   -227 (-45.77%)
> tcp_rate_skb_delivered                        447        272     -175 (-39.15%)          29          18    -11 (-37.93%)
> kprobe__bbr_set_state                        4566       2615    -1951 (-42.73%)         209         124    -85 (-40.67%)
> kprobe__bictcp_state                         4566       2615    -1951 (-42.73%)         209         124    -85 (-40.67%)
> inet_sock_set_state                          1501       1337     -164 (-10.93%)          93          85      -8 (-8.60%)
> tcp_retransmit_skb                           1145        981     -164 (-14.32%)          67          59     -8 (-11.94%)
> tcp_retransmit_synack                        1183        951     -232 (-19.61%)          67          55    -12 (-17.91%)
> bpf_tcptuner                                 1459       1187     -272 (-18.64%)          99          80    -19 (-19.19%)
> tw_egress                                     801        776       -25 (-3.12%)          69          66      -3 (-4.35%)
> tw_ingress                                    795        770       -25 (-3.14%)          69          66      -3 (-4.35%)
> ttls_tc_ingress                             19025      19383      +358 (+1.88%)         470         465      -5 (-1.06%)
> ttls_nat_egress                               490        299     -191 (-38.98%)          33          20    -13 (-39.39%)
> ttls_nat_ingress                              448        285     -163 (-36.38%)          32          21    -11 (-34.38%)
> tw_twfw_egress                             511127     212071  -299056 (-58.51%)       16733        8504  -8229 (-49.18%)
> tw_twfw_ingress                            500095     212069  -288026 (-57.59%)       16223        8504  -7719 (-47.58%)
> tw_twfw_tc_eg                              511113     212064  -299049 (-58.51%)       16732        8504  -8228 (-49.18%)
> tw_twfw_tc_in                              500095     212069  -288026 (-57.59%)       16223        8504  -7719 (-47.58%)
> tw_twfw_egress                              12632      12435      -197 (-1.56%)         276         260     -16 (-5.80%)
> tw_twfw_ingress                             12631      12454      -177 (-1.40%)         278         261     -17 (-6.12%)
> tw_twfw_tc_eg                               12595      12435      -160 (-1.27%)         274         259     -15 (-5.47%)
> tw_twfw_tc_in                               12631      12454      -177 (-1.40%)         278         261     -17 (-6.12%)
> tw_xdp_dump                                   266        209      -57 (-21.43%)           9           8     -1 (-11.11%)
>
> CILIUM
> =========
> File           Program                           Insns (A)  Insns (B)  Insns     (DIFF)  States (A)  States (B)  States  (DIFF)
> -------------  --------------------------------  ---------  ---------  ----------------  ----------  ----------  --------------
> bpf_host.o     cil_to_netdev                          6047       4578   -1469 (-24.29%)         362         249  -113 (-31.22%)
> bpf_host.o     handle_lxc_traffic                     2227       1585    -642 (-28.83%)         156         103   -53 (-33.97%)
> bpf_host.o     tail_handle_ipv4_from_netdev           2244       1458    -786 (-35.03%)         163         106   -57 (-34.97%)
> bpf_host.o     tail_handle_nat_fwd_ipv4              21022      10479  -10543 (-50.15%)        1289         670  -619 (-48.02%)
> bpf_host.o     tail_handle_nat_fwd_ipv6              15433      11375   -4058 (-26.29%)         905         643  -262 (-28.95%)
> bpf_host.o     tail_ipv4_host_policy_ingress          2219       1367    -852 (-38.40%)         161          96   -65 (-40.37%)
> bpf_host.o     tail_nodeport_nat_egress_ipv4         22460      19862   -2598 (-11.57%)        1469        1293  -176 (-11.98%)
> bpf_host.o     tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
> bpf_host.o     tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
> bpf_host.o     tail_nodeport_nat_ipv6_egress          3702       3542     -160 (-4.32%)         215         205    -10 (-4.65%)
> bpf_lxc.o      tail_handle_nat_fwd_ipv4              21022      10479  -10543 (-50.15%)        1289         670  -619 (-48.02%)
> bpf_lxc.o      tail_handle_nat_fwd_ipv6              15433      11375   -4058 (-26.29%)         905         643  -262 (-28.95%)
> bpf_lxc.o      tail_ipv4_ct_egress                    5073       3374   -1699 (-33.49%)         262         172   -90 (-34.35%)
> bpf_lxc.o      tail_ipv4_ct_ingress                   5093       3385   -1708 (-33.54%)         262         172   -90 (-34.35%)
> bpf_lxc.o      tail_ipv4_ct_ingress_policy_only       5093       3385   -1708 (-33.54%)         262         172   -90 (-34.35%)
> bpf_lxc.o      tail_ipv6_ct_egress                    4593       3878    -715 (-15.57%)         194         151   -43 (-22.16%)
> bpf_lxc.o      tail_ipv6_ct_ingress                   4606       3891    -715 (-15.52%)         194         151   -43 (-22.16%)
> bpf_lxc.o      tail_ipv6_ct_ingress_policy_only       4606       3891    -715 (-15.52%)         194         151   -43 (-22.16%)
> bpf_lxc.o      tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
> bpf_lxc.o      tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
> bpf_overlay.o  tail_handle_nat_fwd_ipv4              20524      10114  -10410 (-50.72%)        1271         638  -633 (-49.80%)
> bpf_overlay.o  tail_nodeport_nat_egress_ipv4         22718      19490   -3228 (-14.21%)        1475        1275  -200 (-13.56%)
> bpf_overlay.o  tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
> bpf_overlay.o  tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
> bpf_overlay.o  tail_nodeport_nat_ipv6_egress          3638       3548      -90 (-2.47%)         209         203     -6 (-2.87%)
> bpf_overlay.o  tail_rev_nodeport_lb4                  4368       3820    -548 (-12.55%)         248         215   -33 (-13.31%)
> bpf_overlay.o  tail_rev_nodeport_lb6                  2867       2428    -439 (-15.31%)         167         140   -27 (-16.17%)
> bpf_sock.o     cil_sock6_connect                      1718       1703      -15 (-0.87%)         100          99     -1 (-1.00%)
> bpf_xdp.o      tail_handle_nat_fwd_ipv4              12917      12443     -474 (-3.67%)         875         849    -26 (-2.97%)
> bpf_xdp.o      tail_handle_nat_fwd_ipv6              13515      13264     -251 (-1.86%)         715         702    -13 (-1.82%)
> bpf_xdp.o      tail_lb_ipv4                          39492      36367    -3125 (-7.91%)        2430        2251   -179 (-7.37%)
> bpf_xdp.o      tail_lb_ipv6                          80441      78058    -2383 (-2.96%)        3647        3523   -124 (-3.40%)
> bpf_xdp.o      tail_nodeport_ipv6_dsr                 1038        901    -137 (-13.20%)          61          55     -6 (-9.84%)
> bpf_xdp.o      tail_nodeport_nat_egress_ipv4         13027      12096     -931 (-7.15%)         868         809    -59 (-6.80%)
> bpf_xdp.o      tail_nodeport_nat_ingress_ipv4         7617       5900   -1717 (-22.54%)         522         413  -109 (-20.88%)
> bpf_xdp.o      tail_nodeport_nat_ingress_ipv6         7575       7395     -180 (-2.38%)         383         374     -9 (-2.35%)
> bpf_xdp.o      tail_rev_nodeport_lb4                  6808       6739      -69 (-1.01%)         403         396     -7 (-1.74%)
> bpf_xdp.o      tail_rev_nodeport_lb6                 16173      15847     -326 (-2.02%)        1010         990    -20 (-1.98%)
>

So I also want to mention that while I did spot check a few programs
(not the biggest ones) and they did seem to have correct verification
flow, I obviously can't easily validate verifier log_level=2 logs for
all of the changes above, especially those multi-thousand state
programs. I'd really appreciate someone from Isovalent/Cilium to do
some checking of the Cilium program or two for sanity, just in case.
Thanks!

> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  kernel/bpf/verifier.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 8cfe060e4938..e42ce974b106 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4668,8 +4668,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>                 return err;
>
>         mark_stack_slot_scratched(env, spi);
> -       if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
> -           !register_is_null(reg) && env->bpf_capable) {
> +       if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && env->bpf_capable) {
>                 save_register_state(env, state, spi, reg, size);
>                 /* Break the relation on a narrowing spill. */
>                 if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
> @@ -4718,7 +4717,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>                 /* when we zero initialize stack slots mark them as such */
>                 if ((reg && register_is_null(reg)) ||
>                     (!reg && is_bpf_st_mem(insn) && insn->imm == 0)) {
> -                       /* backtracking doesn't work for STACK_ZERO yet. */
> +                       /* STACK_ZERO case happened because register spill
> +                        * wasn't properly aligned at the stack slot boundary,
> +                        * so it's not a register spill anymore; force
> +                        * originating register to be precise to make
> +                        * STACK_ZERO correct for subsequent states
> +                        */
>                         err = mark_chain_precision(env, value_regno);
>                         if (err)
>                                 return err;
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
  2023-10-31  5:22   ` Andrii Nakryiko
@ 2023-11-01  7:56     ` Jiri Olsa
  2023-11-01 16:27       ` Andrii Nakryiko
  0 siblings, 1 reply; 45+ messages in thread
From: Jiri Olsa @ 2023-11-01  7:56 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Mon, Oct 30, 2023 at 10:22:48PM -0700, Andrii Nakryiko wrote:
> On Mon, Oct 30, 2023 at 10:03 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Now that precision backtracing is supporting register spill/fill to/from
> > stack, there is another oportunity to be exploited here: minimizing
> > precise STACK_ZERO cases. With a simple code change we can rely on
> > initially imprecise register spill tracking for cases when register
> > spilled to stack was a known zero.
> >
> > This is a very common case for initializing on the stack variables,
> > including rather large structures. Often times zero has no special
> > meaning for the subsequent BPF program logic and is often overwritten
> > with non-zero values soon afterwards. But due to STACK_ZERO vs
> > STACK_MISC tracking, such initial zero initialization actually causes
> > duplication of verifier states as STACK_ZERO is clearly different than
> > STACK_MISC or spilled SCALAR_VALUE register.
> >
> > The effect of this (now) trivial change is huge, as can be seen below.
> > These are differences between BPF selftests, Cilium, and Meta-internal
> > BPF object files relative to previous patch in this series. You can see
> > improvements ranging from single-digit percentage improvement for
> > instructions and states, all the way to 50-60% reduction for some of
> > Meta-internal host agent programs, and even some Cilium programs.
> >
> > For Meta-internal ones I left only the differences for largest BPF
> > object files by states/instructions, as there were too many differences
> > in the overall output. All the differences were improvements, reducting
> > number of states and thus instructions validated.
> >
> > Note, Meta-internal BPF object file names are not printed below.
> > Many copies of balancer_ingress are actually many different
> > configurations of Katran, so they are different BPF programs, which
> > explains state reduction going from -16% all the way to 31%, depending
> > on BPF program logic complexity.
> >
> > SELFTESTS
> > =========
> > File                                     Program                  Insns (A)  Insns (B)  Insns    (DIFF)  States (A)  States (B)  States (DIFF)
> > ---------------------------------------  -----------------------  ---------  ---------  ---------------  ----------  ----------  -------------
> > bpf_iter_netlink.bpf.linked3.o           dump_netlink                   148        104    -44 (-29.73%)           8           5   -3 (-37.50%)
> > bpf_iter_unix.bpf.linked3.o              dump_unix                     8474       8404     -70 (-0.83%)         151         147    -4 (-2.65%)
> > bpf_loop.bpf.linked3.o                   stack_check                    560        324   -236 (-42.14%)          42          24  -18 (-42.86%)
> > local_storage_bench.bpf.linked3.o        get_local                      120         77    -43 (-35.83%)           9           6   -3 (-33.33%)
> > loop6.bpf.linked3.o                      trace_virtqueue_add_sgs      10167       9868    -299 (-2.94%)         226         206   -20 (-8.85%)
> > pyperf600_bpf_loop.bpf.linked3.o         on_event                      4872       3423  -1449 (-29.74%)         322         229  -93 (-28.88%)
> > strobemeta.bpf.linked3.o                 on_event                    180697     176036   -4661 (-2.58%)        4780        4734   -46 (-0.96%)
> > test_cls_redirect.bpf.linked3.o          cls_redirect                 65594      65401    -193 (-0.29%)        4230        4212   -18 (-0.43%)
> > test_global_func_args.bpf.linked3.o      test_cls                       145        136      -9 (-6.21%)          10           9   -1 (-10.00%)
> > test_l4lb.bpf.linked3.o                  balancer_ingress              4760       2612  -2148 (-45.13%)         113         102   -11 (-9.73%)
> > test_l4lb_noinline.bpf.linked3.o         balancer_ingress              4845       4877     +32 (+0.66%)         219         221    +2 (+0.91%)
> > test_l4lb_noinline_dynptr.bpf.linked3.o  balancer_ingress              2072       2087     +15 (+0.72%)          97          98    +1 (+1.03%)
> > test_seg6_loop.bpf.linked3.o             __add_egr_x                  12440       9975  -2465 (-19.82%)         364         353   -11 (-3.02%)
> > test_tcp_hdr_options.bpf.linked3.o       estab                         2558       2572     +14 (+0.55%)         179         180    +1 (+0.56%)
> > test_xdp_dynptr.bpf.linked3.o            _xdp_tx_iptunnel               645        596     -49 (-7.60%)          26          24    -2 (-7.69%)
> > test_xdp_noinline.bpf.linked3.o          balancer_ingress_v6           3520       3516      -4 (-0.11%)         216         216    +0 (+0.00%)
> > xdp_synproxy_kern.bpf.linked3.o          syncookie_tc                 82661      81241   -1420 (-1.72%)        5073        5155   +82 (+1.62%)
> > xdp_synproxy_kern.bpf.linked3.o          syncookie_xdp                84964      82297   -2667 (-3.14%)        5130        5157   +27 (+0.53%)
> >
> > META-INTERNAL
> > =============
> > Program                                 Insns (A)  Insns (B)  Insns      (DIFF)  States (A)  States (B)  States   (DIFF)
> > --------------------------------------  ---------  ---------  -----------------  ----------  ----------  ---------------
> > balancer_ingress                            27925      23608    -4317 (-15.46%)        1488        1482      -6 (-0.40%)
> > balancer_ingress                            31824      27546    -4278 (-13.44%)        1658        1652      -6 (-0.36%)
> > balancer_ingress                            32213      27935    -4278 (-13.28%)        1689        1683      -6 (-0.36%)
> > balancer_ingress                            32213      27935    -4278 (-13.28%)        1689        1683      -6 (-0.36%)
> > balancer_ingress                            31824      27546    -4278 (-13.44%)        1658        1652      -6 (-0.36%)
> > balancer_ingress                            38647      29562    -9085 (-23.51%)        2069        1835   -234 (-11.31%)
> > balancer_ingress                            38647      29562    -9085 (-23.51%)        2069        1835   -234 (-11.31%)
> > balancer_ingress                            40339      30792    -9547 (-23.67%)        2193        1934   -259 (-11.81%)
> > balancer_ingress                            37321      29055    -8266 (-22.15%)        1972        1795    -177 (-8.98%)
> > balancer_ingress                            38176      29753    -8423 (-22.06%)        2008        1831    -177 (-8.81%)
> > balancer_ingress                            29193      20910    -8283 (-28.37%)        1599        1422   -177 (-11.07%)
> > balancer_ingress                            30013      21452    -8561 (-28.52%)        1645        1447   -198 (-12.04%)
> > balancer_ingress                            28691      24290    -4401 (-15.34%)        1545        1531     -14 (-0.91%)
> > balancer_ingress                            34223      28965    -5258 (-15.36%)        1984        1875    -109 (-5.49%)
> > balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> > balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> > balancer_ingress                            35868      26455    -9413 (-26.24%)        2140        1827   -313 (-14.63%)
> > balancer_ingress                            35868      26455    -9413 (-26.24%)        2140        1827   -313 (-14.63%)
> > balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> > balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> > balancer_ingress                            34844      29485    -5359 (-15.38%)        2036        1918    -118 (-5.80%)
> > fbflow_egress                                3256       2652     -604 (-18.55%)         218         192    -26 (-11.93%)
> > fbflow_ingress                               1026        944       -82 (-7.99%)          70          63     -7 (-10.00%)
> > sslwall_tc_egress                            8424       7360    -1064 (-12.63%)         498         458     -40 (-8.03%)
> > syar_accept_protect                         15040       9539    -5501 (-36.58%)         364         220   -144 (-39.56%)
> > syar_connect_tcp_v6                         15036       9535    -5501 (-36.59%)         360         216   -144 (-40.00%)
> > syar_connect_udp_v4                         15039       9538    -5501 (-36.58%)         361         217   -144 (-39.89%)
> > syar_connect_connect4_protect4              24805      15833    -8972 (-36.17%)         756         480   -276 (-36.51%)
> > syar_lsm_file_open                         167772     151813    -15959 (-9.51%)        1836        1667    -169 (-9.20%)
> > syar_namespace_create_new                   14805       9304    -5501 (-37.16%)         353         209   -144 (-40.79%)
> > syar_python3_detect                         17531      12030    -5501 (-31.38%)         391         247   -144 (-36.83%)
> > syar_ssh_post_fork                          16412      10911    -5501 (-33.52%)         405         261   -144 (-35.56%)
> > syar_enter_execve                           14728       9227    -5501 (-37.35%)         345         201   -144 (-41.74%)
> > syar_enter_execveat                         14728       9227    -5501 (-37.35%)         345         201   -144 (-41.74%)
> > syar_exit_execve                            16622      11121    -5501 (-33.09%)         376         232   -144 (-38.30%)
> > syar_exit_execveat                          16622      11121    -5501 (-33.09%)         376         232   -144 (-38.30%)
> > syar_syscalls_kill                          15288       9787    -5501 (-35.98%)         398         254   -144 (-36.18%)
> > syar_task_enter_pivot_root                  14898       9397    -5501 (-36.92%)         357         213   -144 (-40.34%)
> > syar_syscalls_setreuid                      16678      11177    -5501 (-32.98%)         429         285   -144 (-33.57%)
> > syar_syscalls_setuid                        16678      11177    -5501 (-32.98%)         429         285   -144 (-33.57%)
> > syar_syscalls_process_vm_readv              14959       9458    -5501 (-36.77%)         364         220   -144 (-39.56%)
> > syar_syscalls_process_vm_writev             15757      10256    -5501 (-34.91%)         390         246   -144 (-36.92%)
> > do_uprobe                                   15519      10018    -5501 (-35.45%)         373         229   -144 (-38.61%)
> > edgewall                                   179715      55783  -123932 (-68.96%)       12607        3999  -8608 (-68.28%)
> > bictcp_state                                 7570       4131    -3439 (-45.43%)         496         269   -227 (-45.77%)
> > cubictcp_state                               7570       4131    -3439 (-45.43%)         496         269   -227 (-45.77%)
> > tcp_rate_skb_delivered                        447        272     -175 (-39.15%)          29          18    -11 (-37.93%)
> > kprobe__bbr_set_state                        4566       2615    -1951 (-42.73%)         209         124    -85 (-40.67%)
> > kprobe__bictcp_state                         4566       2615    -1951 (-42.73%)         209         124    -85 (-40.67%)
> > inet_sock_set_state                          1501       1337     -164 (-10.93%)          93          85      -8 (-8.60%)
> > tcp_retransmit_skb                           1145        981     -164 (-14.32%)          67          59     -8 (-11.94%)
> > tcp_retransmit_synack                        1183        951     -232 (-19.61%)          67          55    -12 (-17.91%)
> > bpf_tcptuner                                 1459       1187     -272 (-18.64%)          99          80    -19 (-19.19%)
> > tw_egress                                     801        776       -25 (-3.12%)          69          66      -3 (-4.35%)
> > tw_ingress                                    795        770       -25 (-3.14%)          69          66      -3 (-4.35%)
> > ttls_tc_ingress                             19025      19383      +358 (+1.88%)         470         465      -5 (-1.06%)
> > ttls_nat_egress                               490        299     -191 (-38.98%)          33          20    -13 (-39.39%)
> > ttls_nat_ingress                              448        285     -163 (-36.38%)          32          21    -11 (-34.38%)
> > tw_twfw_egress                             511127     212071  -299056 (-58.51%)       16733        8504  -8229 (-49.18%)
> > tw_twfw_ingress                            500095     212069  -288026 (-57.59%)       16223        8504  -7719 (-47.58%)
> > tw_twfw_tc_eg                              511113     212064  -299049 (-58.51%)       16732        8504  -8228 (-49.18%)
> > tw_twfw_tc_in                              500095     212069  -288026 (-57.59%)       16223        8504  -7719 (-47.58%)
> > tw_twfw_egress                              12632      12435      -197 (-1.56%)         276         260     -16 (-5.80%)
> > tw_twfw_ingress                             12631      12454      -177 (-1.40%)         278         261     -17 (-6.12%)
> > tw_twfw_tc_eg                               12595      12435      -160 (-1.27%)         274         259     -15 (-5.47%)
> > tw_twfw_tc_in                               12631      12454      -177 (-1.40%)         278         261     -17 (-6.12%)
> > tw_xdp_dump                                   266        209      -57 (-21.43%)           9           8     -1 (-11.11%)
> >
> > CILIUM
> > =========
> > File           Program                           Insns (A)  Insns (B)  Insns     (DIFF)  States (A)  States (B)  States  (DIFF)
> > -------------  --------------------------------  ---------  ---------  ----------------  ----------  ----------  --------------
> > bpf_host.o     cil_to_netdev                          6047       4578   -1469 (-24.29%)         362         249  -113 (-31.22%)
> > bpf_host.o     handle_lxc_traffic                     2227       1585    -642 (-28.83%)         156         103   -53 (-33.97%)
> > bpf_host.o     tail_handle_ipv4_from_netdev           2244       1458    -786 (-35.03%)         163         106   -57 (-34.97%)
> > bpf_host.o     tail_handle_nat_fwd_ipv4              21022      10479  -10543 (-50.15%)        1289         670  -619 (-48.02%)
> > bpf_host.o     tail_handle_nat_fwd_ipv6              15433      11375   -4058 (-26.29%)         905         643  -262 (-28.95%)
> > bpf_host.o     tail_ipv4_host_policy_ingress          2219       1367    -852 (-38.40%)         161          96   -65 (-40.37%)
> > bpf_host.o     tail_nodeport_nat_egress_ipv4         22460      19862   -2598 (-11.57%)        1469        1293  -176 (-11.98%)
> > bpf_host.o     tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
> > bpf_host.o     tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
> > bpf_host.o     tail_nodeport_nat_ipv6_egress          3702       3542     -160 (-4.32%)         215         205    -10 (-4.65%)
> > bpf_lxc.o      tail_handle_nat_fwd_ipv4              21022      10479  -10543 (-50.15%)        1289         670  -619 (-48.02%)
> > bpf_lxc.o      tail_handle_nat_fwd_ipv6              15433      11375   -4058 (-26.29%)         905         643  -262 (-28.95%)
> > bpf_lxc.o      tail_ipv4_ct_egress                    5073       3374   -1699 (-33.49%)         262         172   -90 (-34.35%)
> > bpf_lxc.o      tail_ipv4_ct_ingress                   5093       3385   -1708 (-33.54%)         262         172   -90 (-34.35%)
> > bpf_lxc.o      tail_ipv4_ct_ingress_policy_only       5093       3385   -1708 (-33.54%)         262         172   -90 (-34.35%)
> > bpf_lxc.o      tail_ipv6_ct_egress                    4593       3878    -715 (-15.57%)         194         151   -43 (-22.16%)
> > bpf_lxc.o      tail_ipv6_ct_ingress                   4606       3891    -715 (-15.52%)         194         151   -43 (-22.16%)
> > bpf_lxc.o      tail_ipv6_ct_ingress_policy_only       4606       3891    -715 (-15.52%)         194         151   -43 (-22.16%)
> > bpf_lxc.o      tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
> > bpf_lxc.o      tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
> > bpf_overlay.o  tail_handle_nat_fwd_ipv4              20524      10114  -10410 (-50.72%)        1271         638  -633 (-49.80%)
> > bpf_overlay.o  tail_nodeport_nat_egress_ipv4         22718      19490   -3228 (-14.21%)        1475        1275  -200 (-13.56%)
> > bpf_overlay.o  tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
> > bpf_overlay.o  tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
> > bpf_overlay.o  tail_nodeport_nat_ipv6_egress          3638       3548      -90 (-2.47%)         209         203     -6 (-2.87%)
> > bpf_overlay.o  tail_rev_nodeport_lb4                  4368       3820    -548 (-12.55%)         248         215   -33 (-13.31%)
> > bpf_overlay.o  tail_rev_nodeport_lb6                  2867       2428    -439 (-15.31%)         167         140   -27 (-16.17%)
> > bpf_sock.o     cil_sock6_connect                      1718       1703      -15 (-0.87%)         100          99     -1 (-1.00%)
> > bpf_xdp.o      tail_handle_nat_fwd_ipv4              12917      12443     -474 (-3.67%)         875         849    -26 (-2.97%)
> > bpf_xdp.o      tail_handle_nat_fwd_ipv6              13515      13264     -251 (-1.86%)         715         702    -13 (-1.82%)
> > bpf_xdp.o      tail_lb_ipv4                          39492      36367    -3125 (-7.91%)        2430        2251   -179 (-7.37%)
> > bpf_xdp.o      tail_lb_ipv6                          80441      78058    -2383 (-2.96%)        3647        3523   -124 (-3.40%)
> > bpf_xdp.o      tail_nodeport_ipv6_dsr                 1038        901    -137 (-13.20%)          61          55     -6 (-9.84%)
> > bpf_xdp.o      tail_nodeport_nat_egress_ipv4         13027      12096     -931 (-7.15%)         868         809    -59 (-6.80%)
> > bpf_xdp.o      tail_nodeport_nat_ingress_ipv4         7617       5900   -1717 (-22.54%)         522         413  -109 (-20.88%)
> > bpf_xdp.o      tail_nodeport_nat_ingress_ipv6         7575       7395     -180 (-2.38%)         383         374     -9 (-2.35%)
> > bpf_xdp.o      tail_rev_nodeport_lb4                  6808       6739      -69 (-1.01%)         403         396     -7 (-1.74%)
> > bpf_xdp.o      tail_rev_nodeport_lb6                 16173      15847     -326 (-2.02%)        1010         990    -20 (-1.98%)
> >
> 
> So I also want to mention that while I did spot check a few programs
> (not the biggest ones) and they did seem to have correct verification
> flow, I obviously can't easily validate verifier log_level=2 logs for
> all of the changes above, especially those multi-thousand state
> programs. I'd really appreciate someone from Isovalent/Cilium to do
> some checking of the Cilium program or two for sanity, just in case.
> Thanks!

fyi, I was curious so tried that on top of tetragon programs,
seems up and down, but verification time is mostly lower ;-)

jirka


---
$ veristat --compare veristat.old veristat.new

File                            Program                        Duration (us) (A)  Duration (us) (B)  Duration (us) (DIFF)  Insns (A)  Insns (B)  Insns     (DIFF)  States (A)  States (B)  States   (DIFF)  Peak states (A)  Peak states (B)  Peak states (DIFF)
------------------------------  -----------------------------  -----------------  -----------------  --------------------  ---------  ---------  ----------------  ----------  ----------  ---------------  ---------------  ---------------  ------------------
bpf_cgroup_mkdir.o              tg_tp_cgrp_mkdir                             206                190          -16 (-7.77%)        581        581       +0 (+0.00%)          24          24      +0 (+0.00%)               24               24         +0 (+0.00%)
bpf_cgroup_release.o            tg_tp_cgrp_release                           114                104          -10 (-8.77%)        381        381       +0 (+0.00%)          13          13      +0 (+0.00%)               13               13         +0 (+0.00%)
bpf_cgroup_rmdir.o              tg_tp_cgrp_rmdir                             126                121           -5 (-3.97%)        381        381       +0 (+0.00%)          13          13      +0 (+0.00%)               13               13         +0 (+0.00%)
bpf_execve_bprm_commit_creds.o  tg_kp_bprm_committing_creds                  100                 95           -5 (-5.00%)        163        163       +0 (+0.00%)          14          14      +0 (+0.00%)               14               14         +0 (+0.00%)
bpf_execve_event.o              event_execve                               12147              12843         +696 (+5.73%)      35096      34723     -373 (-1.06%)        2278        2251     -27 (-1.19%)             1110             1115         +5 (+0.45%)
bpf_execve_event.o              execve_send                                   93                 57         -36 (-38.71%)         82         82       +0 (+0.00%)           6           6      +0 (+0.00%)                6                6         +0 (+0.00%)
bpf_execve_event_v53.o          event_execve                               97457              98430         +973 (+1.00%)     245365     239363    -6002 (-2.45%)       15430       15334     -96 (-0.62%)             7994             7929        -65 (-0.81%)
bpf_execve_event_v53.o          execve_send                                   52                 54           +2 (+3.85%)        105        105       +0 (+0.00%)           5           5      +0 (+0.00%)                5                5         +0 (+0.00%)
bpf_execve_event_v61.o          event_execve                                6094               6059          -35 (-0.57%)      27456      26871     -585 (-2.13%)         671         636     -35 (-5.22%)              301              309         +8 (+2.66%)
bpf_execve_event_v61.o          execve_send                                   66                 69           +3 (+4.55%)        105        105       +0 (+0.00%)           5           5      +0 (+0.00%)                5                5         +0 (+0.00%)
bpf_exit.o                      event_exit                                    65                 53         -12 (-18.46%)         94         94       +0 (+0.00%)           8           8      +0 (+0.00%)                8                8         +0 (+0.00%)
bpf_fork.o                      event_wake_up_new_task                       179                209         +30 (+16.76%)        514        514       +0 (+0.00%)          30          30      +0 (+0.00%)               30               30         +0 (+0.00%)
bpf_generic_kprobe.o            generic_fmodret_override                      67                 70           +3 (+4.48%)         18         18       +0 (+0.00%)           2           2      +0 (+0.00%)                2                2         +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_actions                      2386               1893        -493 (-20.66%)       6746       6746       +0 (+0.00%)         287         287      +0 (+0.00%)              207              207         +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_event                         302                306           +4 (+1.32%)        580        580       +0 (+0.00%)          47          47      +0 (+0.00%)               47               47         +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_filter_arg1                  2679               2464         -215 (-8.03%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_filter_arg2                  2487               2777        +290 (+11.66%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_filter_arg3                  2905               2620         -285 (-9.81%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_filter_arg4                  2834               2706         -128 (-4.52%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_filter_arg5                  2771               2621         -150 (-5.41%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_output                         44                 41           -3 (-6.82%)         29         29       +0 (+0.00%)           2           2      +0 (+0.00%)                2                2         +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_override                       40                 39           -1 (-2.50%)         20         20       +0 (+0.00%)           2           2      +0 (+0.00%)                2                2         +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_process_event0               7817               7945         +128 (+1.64%)      21321      21001     -320 (-1.50%)        1440        1403     -37 (-2.57%)              906              889        -17 (-1.88%)
bpf_generic_kprobe.o            generic_kprobe_process_event1               7239               7468         +229 (+3.16%)      19782      19681     -101 (-0.51%)        1348        1339      -9 (-0.67%)              888              884         -4 (-0.45%)
bpf_generic_kprobe.o            generic_kprobe_process_event2               7415               7691         +276 (+3.72%)      19782      19680     -102 (-0.52%)        1348        1339      -9 (-0.67%)              888              884         -4 (-0.45%)
bpf_generic_kprobe.o            generic_kprobe_process_event3               7581               7024         -557 (-7.35%)      19779      19680      -99 (-0.50%)        1348        1338     -10 (-0.74%)              888              883         -5 (-0.56%)
bpf_generic_kprobe.o            generic_kprobe_process_event4               8016               7572         -444 (-5.54%)      19760      19658     -102 (-0.52%)        1355        1344     -11 (-0.81%)              891              885         -6 (-0.67%)
bpf_generic_kprobe.o            generic_kprobe_process_filter              43093              31779      -11314 (-26.25%)      77948      66684  -11264 (-14.45%)        6048        5009  -1039 (-17.18%)             1678             1640        -38 (-2.26%)
bpf_generic_kprobe_v53.o        generic_fmodret_override                      64                 66           +2 (+3.12%)         18         18       +0 (+0.00%)           2           2      +0 (+0.00%)                2                2         +0 (+0.00%)
bpf_generic_kprobe_v53.o        generic_kprobe_actions                     23258              14115       -9143 (-39.31%)      42545      42545       +0 (+0.00%)        1434        1434      +0 (+0.00%)              378              378         +0 (+0.00%)
bpf_generic_kprobe_v53.o        generic_kprobe_event                         298                303           +5 (+1.68%)        583        583       +0 (+0.00%)          47          47      +0 (+0.00%)               47               47         +0 (+0.00%)
bpf_generic_kprobe_v53.o        generic_kprobe_filter_arg1                 25215              26076         +861 (+3.41%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_kprobe_v53.o        generic_kprobe_filter_arg2                 24813              24288         -525 (-2.12%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_kprobe_v53.o        generic_kprobe_filter_arg3                 26494              24362        -2132 (-8.05%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_kprobe_v53.o        generic_kprobe_filter_arg4                 24373              24041         -332 (-1.36%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_kprobe_v53.o        generic_kprobe_filter_arg5                 26265              24317        -1948 (-7.42%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_kprobe_v53.o        generic_kprobe_output                        119                148         +29 (+24.37%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)               19               19         +0 (+0.00%)
bpf_generic_kprobe_v53.o        generic_kprobe_override                       38                 39           +1 (+2.63%)         20         20       +0 (+0.00%)           2           2      +0 (+0.00%)                2                2         +0 (+0.00%)
bpf_generic_kprobe_v53.o        generic_kprobe_process_event0             102334             101040        -1294 (-1.26%)     283295     283172     -123 (-0.04%)       16044       16033     -11 (-0.07%)             8123             8123         +0 (+0.00%)
bpf_generic_kprobe_v53.o        generic_kprobe_process_event1             108349             106105        -2244 (-2.07%)     313458     315263    +1805 (+0.58%)       16524       16544     +20 (+0.12%)             8121             8123         +2 (+0.02%)
bpf_generic_kprobe_v53.o        generic_kprobe_process_event2             109991             105951        -4040 (-3.67%)     313458     315263    +1805 (+0.58%)       16524       16544     +20 (+0.12%)             8121             8123         +2 (+0.02%)
bpf_generic_kprobe_v53.o        generic_kprobe_process_event3             110279             109525         -754 (-0.68%)     313455     315260    +1805 (+0.58%)       16524       16544     +20 (+0.12%)             8121             8123         +2 (+0.02%)
bpf_generic_kprobe_v53.o        generic_kprobe_process_event4             106100             111486        +5386 (+5.08%)     296244     308555   +12311 (+4.16%)       16249       16386    +137 (+0.84%)             8116             8135        +19 (+0.23%)
bpf_generic_kprobe_v53.o        generic_kprobe_process_filter              57465              54691        -2774 (-4.83%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)             1525             1421       -104 (-6.82%)
bpf_generic_kprobe_v61.o        generic_fmodret_override                      94                 89           -5 (-5.32%)         18         18       +0 (+0.00%)           2           2      +0 (+0.00%)                2                2         +0 (+0.00%)
bpf_generic_kprobe_v61.o        generic_kprobe_actions                     15903              15072         -831 (-5.23%)      42545      42545       +0 (+0.00%)        1434        1434      +0 (+0.00%)              378              378         +0 (+0.00%)
bpf_generic_kprobe_v61.o        generic_kprobe_event                         303                340         +37 (+12.21%)        583        583       +0 (+0.00%)          47          47      +0 (+0.00%)               47               47         +0 (+0.00%)
bpf_generic_kprobe_v61.o        generic_kprobe_filter_arg1                 25870              24169        -1701 (-6.58%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_kprobe_v61.o        generic_kprobe_filter_arg2                 26667              24070        -2597 (-9.74%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_kprobe_v61.o        generic_kprobe_filter_arg3                 27248              24758        -2490 (-9.14%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_kprobe_v61.o        generic_kprobe_filter_arg4                 27483              26107        -1376 (-5.01%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_kprobe_v61.o        generic_kprobe_filter_arg5                 26764              26316         -448 (-1.67%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_kprobe_v61.o        generic_kprobe_output                        153                149           -4 (-2.61%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)               19               19         +0 (+0.00%)
bpf_generic_kprobe_v61.o        generic_kprobe_override                       56                 51           -5 (-8.93%)         20         20       +0 (+0.00%)           2           2      +0 (+0.00%)                2                2         +0 (+0.00%)
bpf_generic_kprobe_v61.o        generic_kprobe_process_event0              11184              10303         -881 (-7.88%)      58564      49822   -8742 (-14.93%)        1243        1108   -135 (-10.86%)              547              534        -13 (-2.38%)
bpf_generic_kprobe_v61.o        generic_kprobe_process_event1              12683              14576       +1893 (+14.93%)      68450      75716   +7266 (+10.62%)        1477        1566     +89 (+6.03%)              550              538        -12 (-2.18%)
bpf_generic_kprobe_v61.o        generic_kprobe_process_event2              12822              14709       +1887 (+14.72%)      68450      75715   +7265 (+10.61%)        1477        1566     +89 (+6.03%)              550              538        -12 (-2.18%)
bpf_generic_kprobe_v61.o        generic_kprobe_process_event3              13016              15029       +2013 (+15.47%)      68447      75715   +7268 (+10.62%)        1477        1565     +88 (+5.96%)              550              537        -13 (-2.36%)
bpf_generic_kprobe_v61.o        generic_kprobe_process_event4              11141              12815       +1674 (+15.03%)      58981      74350  +15369 (+26.06%)        1292        1522   +230 (+17.80%)              552              558         +6 (+1.09%)
bpf_generic_kprobe_v61.o        generic_kprobe_process_filter              57674              51652       -6022 (-10.44%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)             1525             1421       -104 (-6.82%)
bpf_generic_retkprobe.o         generic_retkprobe_event                    11526              11239         -287 (-2.49%)      28282      28008     -274 (-0.97%)        1973        1949     -24 (-1.22%)             1168             1164         -4 (-0.34%)
bpf_generic_retkprobe_v53.o     generic_retkprobe_event                   108357             105058        -3299 (-3.04%)     231680     231505     -175 (-0.08%)       16131       16113     -18 (-0.11%)             8238             8235         -3 (-0.04%)
bpf_generic_retkprobe_v61.o     generic_retkprobe_event                    10694              11197         +503 (+4.70%)      24960      24775     -185 (-0.74%)        1854        1842     -12 (-0.65%)              656              648         -8 (-1.22%)
bpf_generic_tracepoint.o        generic_tracepoint_actions                  2259               1998        -261 (-11.55%)       6692       6692       +0 (+0.00%)         295         295      +0 (+0.00%)              224              224         +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_arg1                     2523               2569          +46 (+1.82%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_arg2                     2853               2692         -161 (-5.64%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_arg3                     2522               2902        +380 (+15.07%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_arg4                     2538               2837        +299 (+11.78%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_arg5                     2598               2640          +42 (+1.62%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_event                     691                617         -74 (-10.71%)       1487       1487       +0 (+0.00%)          92          92      +0 (+0.00%)               92               92         +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_event0                   7566               8026         +460 (+6.08%)      20592      20479     -113 (-0.55%)        1421        1409     -12 (-0.84%)              870              867         -3 (-0.34%)
bpf_generic_tracepoint.o        generic_tracepoint_event1                   7347               9822       +2475 (+33.69%)      19782      19681     -101 (-0.51%)        1348        1339      -9 (-0.67%)              888              884         -4 (-0.45%)
bpf_generic_tracepoint.o        generic_tracepoint_event2                   7218               7804         +586 (+8.12%)      19782      19680     -102 (-0.52%)        1348        1339      -9 (-0.67%)              888              884         -4 (-0.45%)
bpf_generic_tracepoint.o        generic_tracepoint_event3                   7296               7587         +291 (+3.99%)      19779      19680      -99 (-0.50%)        1348        1338     -10 (-0.74%)              888              883         -5 (-0.56%)
bpf_generic_tracepoint.o        generic_tracepoint_event4                   7215               8109        +894 (+12.39%)      19760      19658     -102 (-0.52%)        1355        1344     -11 (-0.81%)              891              885         -6 (-0.67%)
bpf_generic_tracepoint.o        generic_tracepoint_filter                  41153              33891       -7262 (-17.65%)      77948      66684  -11264 (-14.45%)        6048        5009  -1039 (-17.18%)             1678             1640        -38 (-2.26%)
bpf_generic_tracepoint.o        generic_tracepoint_output                     41                 36          -5 (-12.20%)         29         29       +0 (+0.00%)           2           2      +0 (+0.00%)                2                2         +0 (+0.00%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_actions                 15139              14536         -603 (-3.98%)      41191      41191       +0 (+0.00%)        1397        1397      +0 (+0.00%)              390              390         +0 (+0.00%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_arg1                    26569              23775       -2794 (-10.52%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_arg2                    26853              24057       -2796 (-10.41%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_arg3                    27067              24044       -3023 (-11.17%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_arg4                    24410              23953         -457 (-1.87%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_arg5                    30439              24792       -5647 (-18.55%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_event                     581                591          +10 (+1.72%)       1490       1490       +0 (+0.00%)          92          92      +0 (+0.00%)               92               92         +0 (+0.00%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_event0                  94250              96057        +1807 (+1.92%)     215685     215586      -99 (-0.05%)       14954       14938     -16 (-0.11%)             7900             7897         -3 (-0.04%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_event1                  93947              95801        +1854 (+1.97%)     215701     215602      -99 (-0.05%)       14955       14941     -14 (-0.09%)             7904             7899         -5 (-0.06%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_event2                  96306              95407         -899 (-0.93%)     215701     215602      -99 (-0.05%)       14955       14941     -14 (-0.09%)             7904             7899         -5 (-0.06%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_event3                  97718              90734        -6984 (-7.15%)     215698     215599      -99 (-0.05%)       14955       14941     -14 (-0.09%)             7904             7899         -5 (-0.06%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_event4                  97822              89913        -7909 (-8.09%)     215757     215704      -53 (-0.02%)       14951       14942      -9 (-0.06%)             7896             7897         +1 (+0.01%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_filter                  64076              50012      -14064 (-21.95%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)             1525             1421       -104 (-6.82%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_output                    136                136           +0 (+0.00%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)               19               19         +0 (+0.00%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_actions                 16298              14731        -1567 (-9.61%)      41191      41191       +0 (+0.00%)        1397        1397      +0 (+0.00%)              390              390         +0 (+0.00%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_arg1                    27534              23721       -3813 (-13.85%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_arg2                    28248              24052       -4196 (-14.85%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_arg3                    29118              24012       -5106 (-17.54%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_arg4                    33309              23915       -9394 (-28.20%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_arg5                    28057              24983       -3074 (-10.96%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_event                     555                531          -24 (-4.32%)       1490       1490       +0 (+0.00%)          92          92      +0 (+0.00%)               92               92         +0 (+0.00%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_event0                   2128               2058          -70 (-3.29%)       4403       4100     -303 (-6.88%)         326         305     -21 (-6.44%)              300              292         -8 (-2.67%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_event1                   1982               2028          +46 (+2.32%)       4409       4106     -303 (-6.87%)         328         304     -24 (-7.32%)              303              293        -10 (-3.30%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_event2                   2357               2054        -303 (-12.86%)       4409       4106     -303 (-6.87%)         328         304     -24 (-7.32%)              303              293        -10 (-3.30%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_event3                   2018               1835         -183 (-9.07%)       4406       4103     -303 (-6.88%)         328         304     -24 (-7.32%)              303              293        -10 (-3.30%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_event4                   2094               1910         -184 (-8.79%)       4396       4124     -272 (-6.19%)         323         304     -19 (-5.88%)              299              293         -6 (-2.01%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_filter                  63620              50068      -13552 (-21.30%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)             1525             1421       -104 (-6.82%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_output                    120                141         +21 (+17.50%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)               19               19         +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_actions                      1767               1928         +161 (+9.11%)       5702       5702       +0 (+0.00%)         248         248      +0 (+0.00%)              188              188         +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_event                         232                207         -25 (-10.78%)        429        429       +0 (+0.00%)          33          33      +0 (+0.00%)               33               33         +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_filter_arg1                  2764               2832          +68 (+2.46%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_filter_arg2                  2639               2675          +36 (+1.36%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_filter_arg3                  3875               2529       -1346 (-34.74%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_filter_arg4                  2646               2540         -106 (-4.01%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_filter_arg5                  2510               2674         +164 (+6.53%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)              448              448         +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_output                         41                 39           -2 (-4.88%)         29         29       +0 (+0.00%)           2           2      +0 (+0.00%)                2                2         +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_process_event0               7804               8154         +350 (+4.48%)      21063      20890     -173 (-0.82%)        1419        1400     -19 (-1.34%)              889              887         -2 (-0.22%)
bpf_generic_uprobe.o            generic_uprobe_process_event1               8326               8041         -285 (-3.42%)      19782      19681     -101 (-0.51%)        1348        1339      -9 (-0.67%)              888              884         -4 (-0.45%)
bpf_generic_uprobe.o            generic_uprobe_process_event2               8183               7016       -1167 (-14.26%)      19782      19680     -102 (-0.52%)        1348        1339      -9 (-0.67%)              888              884         -4 (-0.45%)
bpf_generic_uprobe.o            generic_uprobe_process_event3               8127               6999       -1128 (-13.88%)      19779      19680      -99 (-0.50%)        1348        1338     -10 (-0.74%)              888              883         -5 (-0.56%)
bpf_generic_uprobe.o            generic_uprobe_process_event4               8072               7185        -887 (-10.99%)      19760      19658     -102 (-0.52%)        1355        1344     -11 (-0.81%)              891              885         -6 (-0.67%)
bpf_generic_uprobe.o            generic_uprobe_process_filter              40999              31572       -9427 (-22.99%)      77948      66684  -11264 (-14.45%)        6048        5009  -1039 (-17.18%)             1678             1640        -38 (-2.26%)
bpf_generic_uprobe_v53.o        generic_uprobe_actions                     14216              14310          +94 (+0.66%)      39443      39443       +0 (+0.00%)        1336        1336      +0 (+0.00%)              379              379         +0 (+0.00%)
bpf_generic_uprobe_v53.o        generic_uprobe_event                         236                223          -13 (-5.51%)        433        433       +0 (+0.00%)          33          33      +0 (+0.00%)               33               33         +0 (+0.00%)
bpf_generic_uprobe_v53.o        generic_uprobe_filter_arg1                 28012              26052        -1960 (-7.00%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_uprobe_v53.o        generic_uprobe_filter_arg2                 27759              26451        -1308 (-4.71%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_uprobe_v53.o        generic_uprobe_filter_arg3                 27301              25856        -1445 (-5.29%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_uprobe_v53.o        generic_uprobe_filter_arg4                 26331              26187         -144 (-0.55%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_uprobe_v53.o        generic_uprobe_filter_arg5                 27284              26122        -1162 (-4.26%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_uprobe_v53.o        generic_uprobe_output                        148                144           -4 (-2.70%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)               19               19         +0 (+0.00%)
bpf_generic_uprobe_v53.o        generic_uprobe_process_event0             103254              90496      -12758 (-12.36%)     215852     215620     -232 (-0.11%)       14972       14952     -20 (-0.13%)             7905             7899         -6 (-0.08%)
bpf_generic_uprobe_v53.o        generic_uprobe_process_event1             104517              90211      -14306 (-13.69%)     215701     215602      -99 (-0.05%)       14955       14941     -14 (-0.09%)             7904             7899         -5 (-0.06%)
bpf_generic_uprobe_v53.o        generic_uprobe_process_event2             101025              90027      -10998 (-10.89%)     215701     215602      -99 (-0.05%)       14955       14941     -14 (-0.09%)             7904             7899         -5 (-0.06%)
bpf_generic_uprobe_v53.o        generic_uprobe_process_event3              99776              95596        -4180 (-4.19%)     215698     215599      -99 (-0.05%)       14955       14941     -14 (-0.09%)             7904             7899         -5 (-0.06%)
bpf_generic_uprobe_v53.o        generic_uprobe_process_event4              99896              96233        -3663 (-3.67%)     215757     215704      -53 (-0.02%)       14951       14942      -9 (-0.06%)             7896             7897         +1 (+0.01%)
bpf_generic_uprobe_v53.o        generic_uprobe_process_filter              65621              56496       -9125 (-13.91%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)             1525             1421       -104 (-6.82%)
bpf_generic_uprobe_v61.o        generic_uprobe_actions                     14050              14958         +908 (+6.46%)      39443      39443       +0 (+0.00%)        1336        1336      +0 (+0.00%)              379              379         +0 (+0.00%)
bpf_generic_uprobe_v61.o        generic_uprobe_event                         241                309         +68 (+28.22%)        433        433       +0 (+0.00%)          33          33      +0 (+0.00%)               33               33         +0 (+0.00%)
bpf_generic_uprobe_v61.o        generic_uprobe_filter_arg1                 30324              26943       -3381 (-11.15%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_uprobe_v61.o        generic_uprobe_filter_arg2                 26755              26758           +3 (+0.01%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_uprobe_v61.o        generic_uprobe_filter_arg3                 28337              27992         -345 (-1.22%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_uprobe_v61.o        generic_uprobe_filter_arg4                 26332              27308         +976 (+3.71%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_uprobe_v61.o        generic_uprobe_filter_arg5                 27209              26780         -429 (-1.58%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_generic_uprobe_v61.o        generic_uprobe_output                        138                146           +8 (+5.80%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)               19               19         +0 (+0.00%)
bpf_generic_uprobe_v61.o        generic_uprobe_process_event0               2194               2133          -61 (-2.78%)       4395       4152     -243 (-5.53%)         329         312     -17 (-5.17%)              303              297         -6 (-1.98%)
bpf_generic_uprobe_v61.o        generic_uprobe_process_event1               1885               1832          -53 (-2.81%)       4409       4106     -303 (-6.87%)         328         304     -24 (-7.32%)              303              293        -10 (-3.30%)
bpf_generic_uprobe_v61.o        generic_uprobe_process_event2               2775               1966        -809 (-29.15%)       4409       4106     -303 (-6.87%)         328         304     -24 (-7.32%)              303              293        -10 (-3.30%)
bpf_generic_uprobe_v61.o        generic_uprobe_process_event3               3237               2004       -1233 (-38.09%)       4406       4103     -303 (-6.88%)         328         304     -24 (-7.32%)              303              293        -10 (-3.30%)
bpf_generic_uprobe_v61.o        generic_uprobe_process_event4               1950               2031          +81 (+4.15%)       4396       4124     -272 (-6.19%)         323         304     -19 (-5.88%)              299              293         -6 (-2.01%)
bpf_generic_uprobe_v61.o        generic_uprobe_process_filter              62774              56727        -6047 (-9.63%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)             1525             1421       -104 (-6.82%)
bpf_globals.o                   read_globals_test                              0                  0           +0 (+0.00%)          0          0       +0 (+0.00%)           0           0      +0 (+0.00%)                0                0         +0 (+0.00%)
bpf_killer.o                    killer                                        27                 28           +1 (+3.70%)         33         33       +0 (+0.00%)           3           3      +0 (+0.00%)                3                3         +0 (+0.00%)
bpf_loader.o                    loader_kprobe                                 84                 82           -2 (-2.38%)        144        144       +0 (+0.00%)          10          10      +0 (+0.00%)               10               10         +0 (+0.00%)
bpf_lseek.o                     test_lseek                                    54                 41         -13 (-24.07%)         67         67       +0 (+0.00%)           2           2      +0 (+0.00%)                2                2         +0 (+0.00%)
bpf_multi_killer.o              killer                                        22                 22           +0 (+0.00%)         33         33       +0 (+0.00%)           3           3      +0 (+0.00%)                3                3         +0 (+0.00%)
bpf_multi_kprobe_v53.o          generic_fmodret_override                     108                 73         -35 (-32.41%)         18         18       +0 (+0.00%)           2           2      +0 (+0.00%)                2                2         +0 (+0.00%)
bpf_multi_kprobe_v53.o          generic_kprobe_actions                     29346              14095      -15251 (-51.97%)      42545      42545       +0 (+0.00%)        1434        1434      +0 (+0.00%)              378              378         +0 (+0.00%)
bpf_multi_kprobe_v53.o          generic_kprobe_event                         339                345           +6 (+1.77%)        585        585       +0 (+0.00%)          48          48      +0 (+0.00%)               48               48         +0 (+0.00%)
bpf_multi_kprobe_v53.o          generic_kprobe_filter_arg1                 33490              23550       -9940 (-29.68%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_multi_kprobe_v53.o          generic_kprobe_filter_arg2                 42586              24318      -18268 (-42.90%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_multi_kprobe_v53.o          generic_kprobe_filter_arg3                 39256              24731      -14525 (-37.00%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_multi_kprobe_v53.o          generic_kprobe_filter_arg4                 41607              23955      -17652 (-42.43%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_multi_kprobe_v53.o          generic_kprobe_filter_arg5                 49382              24518      -24864 (-50.35%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_multi_kprobe_v53.o          generic_kprobe_output                        185                128         -57 (-30.81%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)               19               19         +0 (+0.00%)
bpf_multi_kprobe_v53.o          generic_kprobe_override                       62                 41         -21 (-33.87%)         20         20       +0 (+0.00%)           2           2      +0 (+0.00%)                2                2         +0 (+0.00%)
bpf_multi_kprobe_v53.o          generic_kprobe_process_event0             113628             100702      -12926 (-11.38%)     283295     283172     -123 (-0.04%)       16044       16033     -11 (-0.07%)             8123             8123         +0 (+0.00%)
bpf_multi_kprobe_v53.o          generic_kprobe_process_event1             132058             106791      -25267 (-19.13%)     313458     315263    +1805 (+0.58%)       16524       16544     +20 (+0.12%)             8121             8123         +2 (+0.02%)
bpf_multi_kprobe_v53.o          generic_kprobe_process_event2             122505             106459      -16046 (-13.10%)     313458     315263    +1805 (+0.58%)       16524       16544     +20 (+0.12%)             8121             8123         +2 (+0.02%)
bpf_multi_kprobe_v53.o          generic_kprobe_process_event3             127258             106633      -20625 (-16.21%)     313455     315260    +1805 (+0.58%)       16524       16544     +20 (+0.12%)             8121             8123         +2 (+0.02%)
bpf_multi_kprobe_v53.o          generic_kprobe_process_event4             121800             111903        -9897 (-8.13%)     296244     308555   +12311 (+4.16%)       16249       16386    +137 (+0.84%)             8116             8135        +19 (+0.23%)
bpf_multi_kprobe_v53.o          generic_kprobe_process_filter              73918              54826      -19092 (-25.83%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)             1525             1421       -104 (-6.82%)
bpf_multi_kprobe_v61.o          generic_fmodret_override                      71                 91         +20 (+28.17%)         18         18       +0 (+0.00%)           2           2      +0 (+0.00%)                2                2         +0 (+0.00%)
bpf_multi_kprobe_v61.o          generic_kprobe_actions                     16654              15088        -1566 (-9.40%)      42545      42545       +0 (+0.00%)        1434        1434      +0 (+0.00%)              378              378         +0 (+0.00%)
bpf_multi_kprobe_v61.o          generic_kprobe_event                         517                278        -239 (-46.23%)        585        585       +0 (+0.00%)          48          48      +0 (+0.00%)               48               48         +0 (+0.00%)
bpf_multi_kprobe_v61.o          generic_kprobe_filter_arg1                 41140              26793      -14347 (-34.87%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_multi_kprobe_v61.o          generic_kprobe_filter_arg2                 30326              26454       -3872 (-12.77%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_multi_kprobe_v61.o          generic_kprobe_filter_arg3                 38517              24452      -14065 (-36.52%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_multi_kprobe_v61.o          generic_kprobe_filter_arg4                 36157              24539      -11618 (-32.13%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_multi_kprobe_v61.o          generic_kprobe_filter_arg5                 40673              25657      -15016 (-36.92%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)             1726             1742        +16 (+0.93%)
bpf_multi_kprobe_v61.o          generic_kprobe_output                        153                150           -3 (-1.96%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)               19               19         +0 (+0.00%)
bpf_multi_kprobe_v61.o          generic_kprobe_override                       40                 51         +11 (+27.50%)         20         20       +0 (+0.00%)           2           2      +0 (+0.00%)                2                2         +0 (+0.00%)
bpf_multi_kprobe_v61.o          generic_kprobe_process_event0              17270               9818       -7452 (-43.15%)      58564      49822   -8742 (-14.93%)        1243        1108   -135 (-10.86%)              547              534        -13 (-2.38%)
bpf_multi_kprobe_v61.o          generic_kprobe_process_event1              16763              13670       -3093 (-18.45%)      68450      75716   +7266 (+10.62%)        1477        1566     +89 (+6.03%)              550              538        -12 (-2.18%)
bpf_multi_kprobe_v61.o          generic_kprobe_process_event2              14321              14000         -321 (-2.24%)      68450      75715   +7265 (+10.61%)        1477        1566     +89 (+6.03%)              550              538        -12 (-2.18%)
bpf_multi_kprobe_v61.o          generic_kprobe_process_event3              14824              13829         -995 (-6.71%)      68447      75715   +7268 (+10.62%)        1477        1565     +88 (+5.96%)              550              537        -13 (-2.36%)
bpf_multi_kprobe_v61.o          generic_kprobe_process_event4              14745              14029         -716 (-4.86%)      58981      74350  +15369 (+26.06%)        1292        1522   +230 (+17.80%)              552              558         +6 (+1.09%)
bpf_multi_kprobe_v61.o          generic_kprobe_process_filter              73994              54979      -19015 (-25.70%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)             1525             1421       -104 (-6.82%)
bpf_multi_retkprobe_v53.o       generic_retkprobe_event                   127625             110224      -17401 (-13.63%)     231631     231456     -175 (-0.08%)       16130       16112     -18 (-0.11%)             8239             8236         -3 (-0.04%)
bpf_multi_retkprobe_v61.o       generic_retkprobe_event                    12110               9753       -2357 (-19.46%)      24404      24110     -294 (-1.20%)        1859        1841     -18 (-0.97%)              658              647        -11 (-1.67%)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
  2023-11-01  7:56     ` Jiri Olsa
@ 2023-11-01 16:27       ` Andrii Nakryiko
  2023-11-02  9:54         ` Jiri Olsa
  0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-01 16:27 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Wed, Nov 1, 2023 at 12:56 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Mon, Oct 30, 2023 at 10:22:48PM -0700, Andrii Nakryiko wrote:
> > On Mon, Oct 30, 2023 at 10:03 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > Now that precision backtracing is supporting register spill/fill to/from
> > > stack, there is another oportunity to be exploited here: minimizing
> > > precise STACK_ZERO cases. With a simple code change we can rely on
> > > initially imprecise register spill tracking for cases when register
> > > spilled to stack was a known zero.
> > >
> > > This is a very common case for initializing on the stack variables,
> > > including rather large structures. Often times zero has no special
> > > meaning for the subsequent BPF program logic and is often overwritten
> > > with non-zero values soon afterwards. But due to STACK_ZERO vs
> > > STACK_MISC tracking, such initial zero initialization actually causes
> > > duplication of verifier states as STACK_ZERO is clearly different than
> > > STACK_MISC or spilled SCALAR_VALUE register.
> > >
> > > The effect of this (now) trivial change is huge, as can be seen below.
> > > These are differences between BPF selftests, Cilium, and Meta-internal
> > > BPF object files relative to previous patch in this series. You can see
> > > improvements ranging from single-digit percentage improvement for
> > > instructions and states, all the way to 50-60% reduction for some of
> > > Meta-internal host agent programs, and even some Cilium programs.
> > >
> > > For Meta-internal ones I left only the differences for largest BPF
> > > object files by states/instructions, as there were too many differences
> > > in the overall output. All the differences were improvements, reducting
> > > number of states and thus instructions validated.
> > >
> > > Note, Meta-internal BPF object file names are not printed below.
> > > Many copies of balancer_ingress are actually many different
> > > configurations of Katran, so they are different BPF programs, which
> > > explains state reduction going from -16% all the way to 31%, depending
> > > on BPF program logic complexity.
> > >
> > > SELFTESTS
> > > =========
> > > File                                     Program                  Insns (A)  Insns (B)  Insns    (DIFF)  States (A)  States (B)  States (DIFF)
> > > ---------------------------------------  -----------------------  ---------  ---------  ---------------  ----------  ----------  -------------
> > > bpf_iter_netlink.bpf.linked3.o           dump_netlink                   148        104    -44 (-29.73%)           8           5   -3 (-37.50%)
> > > bpf_iter_unix.bpf.linked3.o              dump_unix                     8474       8404     -70 (-0.83%)         151         147    -4 (-2.65%)
> > > bpf_loop.bpf.linked3.o                   stack_check                    560        324   -236 (-42.14%)          42          24  -18 (-42.86%)
> > > local_storage_bench.bpf.linked3.o        get_local                      120         77    -43 (-35.83%)           9           6   -3 (-33.33%)
> > > loop6.bpf.linked3.o                      trace_virtqueue_add_sgs      10167       9868    -299 (-2.94%)         226         206   -20 (-8.85%)
> > > pyperf600_bpf_loop.bpf.linked3.o         on_event                      4872       3423  -1449 (-29.74%)         322         229  -93 (-28.88%)
> > > strobemeta.bpf.linked3.o                 on_event                    180697     176036   -4661 (-2.58%)        4780        4734   -46 (-0.96%)
> > > test_cls_redirect.bpf.linked3.o          cls_redirect                 65594      65401    -193 (-0.29%)        4230        4212   -18 (-0.43%)
> > > test_global_func_args.bpf.linked3.o      test_cls                       145        136      -9 (-6.21%)          10           9   -1 (-10.00%)
> > > test_l4lb.bpf.linked3.o                  balancer_ingress              4760       2612  -2148 (-45.13%)         113         102   -11 (-9.73%)
> > > test_l4lb_noinline.bpf.linked3.o         balancer_ingress              4845       4877     +32 (+0.66%)         219         221    +2 (+0.91%)
> > > test_l4lb_noinline_dynptr.bpf.linked3.o  balancer_ingress              2072       2087     +15 (+0.72%)          97          98    +1 (+1.03%)
> > > test_seg6_loop.bpf.linked3.o             __add_egr_x                  12440       9975  -2465 (-19.82%)         364         353   -11 (-3.02%)
> > > test_tcp_hdr_options.bpf.linked3.o       estab                         2558       2572     +14 (+0.55%)         179         180    +1 (+0.56%)
> > > test_xdp_dynptr.bpf.linked3.o            _xdp_tx_iptunnel               645        596     -49 (-7.60%)          26          24    -2 (-7.69%)
> > > test_xdp_noinline.bpf.linked3.o          balancer_ingress_v6           3520       3516      -4 (-0.11%)         216         216    +0 (+0.00%)
> > > xdp_synproxy_kern.bpf.linked3.o          syncookie_tc                 82661      81241   -1420 (-1.72%)        5073        5155   +82 (+1.62%)
> > > xdp_synproxy_kern.bpf.linked3.o          syncookie_xdp                84964      82297   -2667 (-3.14%)        5130        5157   +27 (+0.53%)
> > >
> > > META-INTERNAL
> > > =============
> > > Program                                 Insns (A)  Insns (B)  Insns      (DIFF)  States (A)  States (B)  States   (DIFF)
> > > --------------------------------------  ---------  ---------  -----------------  ----------  ----------  ---------------
> > > balancer_ingress                            27925      23608    -4317 (-15.46%)        1488        1482      -6 (-0.40%)
> > > balancer_ingress                            31824      27546    -4278 (-13.44%)        1658        1652      -6 (-0.36%)
> > > balancer_ingress                            32213      27935    -4278 (-13.28%)        1689        1683      -6 (-0.36%)
> > > balancer_ingress                            32213      27935    -4278 (-13.28%)        1689        1683      -6 (-0.36%)
> > > balancer_ingress                            31824      27546    -4278 (-13.44%)        1658        1652      -6 (-0.36%)
> > > balancer_ingress                            38647      29562    -9085 (-23.51%)        2069        1835   -234 (-11.31%)
> > > balancer_ingress                            38647      29562    -9085 (-23.51%)        2069        1835   -234 (-11.31%)
> > > balancer_ingress                            40339      30792    -9547 (-23.67%)        2193        1934   -259 (-11.81%)
> > > balancer_ingress                            37321      29055    -8266 (-22.15%)        1972        1795    -177 (-8.98%)
> > > balancer_ingress                            38176      29753    -8423 (-22.06%)        2008        1831    -177 (-8.81%)
> > > balancer_ingress                            29193      20910    -8283 (-28.37%)        1599        1422   -177 (-11.07%)
> > > balancer_ingress                            30013      21452    -8561 (-28.52%)        1645        1447   -198 (-12.04%)
> > > balancer_ingress                            28691      24290    -4401 (-15.34%)        1545        1531     -14 (-0.91%)
> > > balancer_ingress                            34223      28965    -5258 (-15.36%)        1984        1875    -109 (-5.49%)
> > > balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> > > balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> > > balancer_ingress                            35868      26455    -9413 (-26.24%)        2140        1827   -313 (-14.63%)
> > > balancer_ingress                            35868      26455    -9413 (-26.24%)        2140        1827   -313 (-14.63%)
> > > balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> > > balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> > > balancer_ingress                            34844      29485    -5359 (-15.38%)        2036        1918    -118 (-5.80%)
> > > fbflow_egress                                3256       2652     -604 (-18.55%)         218         192    -26 (-11.93%)
> > > fbflow_ingress                               1026        944       -82 (-7.99%)          70          63     -7 (-10.00%)
> > > sslwall_tc_egress                            8424       7360    -1064 (-12.63%)         498         458     -40 (-8.03%)
> > > syar_accept_protect                         15040       9539    -5501 (-36.58%)         364         220   -144 (-39.56%)
> > > syar_connect_tcp_v6                         15036       9535    -5501 (-36.59%)         360         216   -144 (-40.00%)
> > > syar_connect_udp_v4                         15039       9538    -5501 (-36.58%)         361         217   -144 (-39.89%)
> > > syar_connect_connect4_protect4              24805      15833    -8972 (-36.17%)         756         480   -276 (-36.51%)
> > > syar_lsm_file_open                         167772     151813    -15959 (-9.51%)        1836        1667    -169 (-9.20%)
> > > syar_namespace_create_new                   14805       9304    -5501 (-37.16%)         353         209   -144 (-40.79%)
> > > syar_python3_detect                         17531      12030    -5501 (-31.38%)         391         247   -144 (-36.83%)
> > > syar_ssh_post_fork                          16412      10911    -5501 (-33.52%)         405         261   -144 (-35.56%)
> > > syar_enter_execve                           14728       9227    -5501 (-37.35%)         345         201   -144 (-41.74%)
> > > syar_enter_execveat                         14728       9227    -5501 (-37.35%)         345         201   -144 (-41.74%)
> > > syar_exit_execve                            16622      11121    -5501 (-33.09%)         376         232   -144 (-38.30%)
> > > syar_exit_execveat                          16622      11121    -5501 (-33.09%)         376         232   -144 (-38.30%)
> > > syar_syscalls_kill                          15288       9787    -5501 (-35.98%)         398         254   -144 (-36.18%)
> > > syar_task_enter_pivot_root                  14898       9397    -5501 (-36.92%)         357         213   -144 (-40.34%)
> > > syar_syscalls_setreuid                      16678      11177    -5501 (-32.98%)         429         285   -144 (-33.57%)
> > > syar_syscalls_setuid                        16678      11177    -5501 (-32.98%)         429         285   -144 (-33.57%)
> > > syar_syscalls_process_vm_readv              14959       9458    -5501 (-36.77%)         364         220   -144 (-39.56%)
> > > syar_syscalls_process_vm_writev             15757      10256    -5501 (-34.91%)         390         246   -144 (-36.92%)
> > > do_uprobe                                   15519      10018    -5501 (-35.45%)         373         229   -144 (-38.61%)
> > > edgewall                                   179715      55783  -123932 (-68.96%)       12607        3999  -8608 (-68.28%)
> > > bictcp_state                                 7570       4131    -3439 (-45.43%)         496         269   -227 (-45.77%)
> > > cubictcp_state                               7570       4131    -3439 (-45.43%)         496         269   -227 (-45.77%)
> > > tcp_rate_skb_delivered                        447        272     -175 (-39.15%)          29          18    -11 (-37.93%)
> > > kprobe__bbr_set_state                        4566       2615    -1951 (-42.73%)         209         124    -85 (-40.67%)
> > > kprobe__bictcp_state                         4566       2615    -1951 (-42.73%)         209         124    -85 (-40.67%)
> > > inet_sock_set_state                          1501       1337     -164 (-10.93%)          93          85      -8 (-8.60%)
> > > tcp_retransmit_skb                           1145        981     -164 (-14.32%)          67          59     -8 (-11.94%)
> > > tcp_retransmit_synack                        1183        951     -232 (-19.61%)          67          55    -12 (-17.91%)
> > > bpf_tcptuner                                 1459       1187     -272 (-18.64%)          99          80    -19 (-19.19%)
> > > tw_egress                                     801        776       -25 (-3.12%)          69          66      -3 (-4.35%)
> > > tw_ingress                                    795        770       -25 (-3.14%)          69          66      -3 (-4.35%)
> > > ttls_tc_ingress                             19025      19383      +358 (+1.88%)         470         465      -5 (-1.06%)
> > > ttls_nat_egress                               490        299     -191 (-38.98%)          33          20    -13 (-39.39%)
> > > ttls_nat_ingress                              448        285     -163 (-36.38%)          32          21    -11 (-34.38%)
> > > tw_twfw_egress                             511127     212071  -299056 (-58.51%)       16733        8504  -8229 (-49.18%)
> > > tw_twfw_ingress                            500095     212069  -288026 (-57.59%)       16223        8504  -7719 (-47.58%)
> > > tw_twfw_tc_eg                              511113     212064  -299049 (-58.51%)       16732        8504  -8228 (-49.18%)
> > > tw_twfw_tc_in                              500095     212069  -288026 (-57.59%)       16223        8504  -7719 (-47.58%)
> > > tw_twfw_egress                              12632      12435      -197 (-1.56%)         276         260     -16 (-5.80%)
> > > tw_twfw_ingress                             12631      12454      -177 (-1.40%)         278         261     -17 (-6.12%)
> > > tw_twfw_tc_eg                               12595      12435      -160 (-1.27%)         274         259     -15 (-5.47%)
> > > tw_twfw_tc_in                               12631      12454      -177 (-1.40%)         278         261     -17 (-6.12%)
> > > tw_xdp_dump                                   266        209      -57 (-21.43%)           9           8     -1 (-11.11%)
> > >
> > > CILIUM
> > > =========
> > > File           Program                           Insns (A)  Insns (B)  Insns     (DIFF)  States (A)  States (B)  States  (DIFF)
> > > -------------  --------------------------------  ---------  ---------  ----------------  ----------  ----------  --------------
> > > bpf_host.o     cil_to_netdev                          6047       4578   -1469 (-24.29%)         362         249  -113 (-31.22%)
> > > bpf_host.o     handle_lxc_traffic                     2227       1585    -642 (-28.83%)         156         103   -53 (-33.97%)
> > > bpf_host.o     tail_handle_ipv4_from_netdev           2244       1458    -786 (-35.03%)         163         106   -57 (-34.97%)
> > > bpf_host.o     tail_handle_nat_fwd_ipv4              21022      10479  -10543 (-50.15%)        1289         670  -619 (-48.02%)
> > > bpf_host.o     tail_handle_nat_fwd_ipv6              15433      11375   -4058 (-26.29%)         905         643  -262 (-28.95%)
> > > bpf_host.o     tail_ipv4_host_policy_ingress          2219       1367    -852 (-38.40%)         161          96   -65 (-40.37%)
> > > bpf_host.o     tail_nodeport_nat_egress_ipv4         22460      19862   -2598 (-11.57%)        1469        1293  -176 (-11.98%)
> > > bpf_host.o     tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
> > > bpf_host.o     tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
> > > bpf_host.o     tail_nodeport_nat_ipv6_egress          3702       3542     -160 (-4.32%)         215         205    -10 (-4.65%)
> > > bpf_lxc.o      tail_handle_nat_fwd_ipv4              21022      10479  -10543 (-50.15%)        1289         670  -619 (-48.02%)
> > > bpf_lxc.o      tail_handle_nat_fwd_ipv6              15433      11375   -4058 (-26.29%)         905         643  -262 (-28.95%)
> > > bpf_lxc.o      tail_ipv4_ct_egress                    5073       3374   -1699 (-33.49%)         262         172   -90 (-34.35%)
> > > bpf_lxc.o      tail_ipv4_ct_ingress                   5093       3385   -1708 (-33.54%)         262         172   -90 (-34.35%)
> > > bpf_lxc.o      tail_ipv4_ct_ingress_policy_only       5093       3385   -1708 (-33.54%)         262         172   -90 (-34.35%)
> > > bpf_lxc.o      tail_ipv6_ct_egress                    4593       3878    -715 (-15.57%)         194         151   -43 (-22.16%)
> > > bpf_lxc.o      tail_ipv6_ct_ingress                   4606       3891    -715 (-15.52%)         194         151   -43 (-22.16%)
> > > bpf_lxc.o      tail_ipv6_ct_ingress_policy_only       4606       3891    -715 (-15.52%)         194         151   -43 (-22.16%)
> > > bpf_lxc.o      tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
> > > bpf_lxc.o      tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
> > > bpf_overlay.o  tail_handle_nat_fwd_ipv4              20524      10114  -10410 (-50.72%)        1271         638  -633 (-49.80%)
> > > bpf_overlay.o  tail_nodeport_nat_egress_ipv4         22718      19490   -3228 (-14.21%)        1475        1275  -200 (-13.56%)
> > > bpf_overlay.o  tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
> > > bpf_overlay.o  tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
> > > bpf_overlay.o  tail_nodeport_nat_ipv6_egress          3638       3548      -90 (-2.47%)         209         203     -6 (-2.87%)
> > > bpf_overlay.o  tail_rev_nodeport_lb4                  4368       3820    -548 (-12.55%)         248         215   -33 (-13.31%)
> > > bpf_overlay.o  tail_rev_nodeport_lb6                  2867       2428    -439 (-15.31%)         167         140   -27 (-16.17%)
> > > bpf_sock.o     cil_sock6_connect                      1718       1703      -15 (-0.87%)         100          99     -1 (-1.00%)
> > > bpf_xdp.o      tail_handle_nat_fwd_ipv4              12917      12443     -474 (-3.67%)         875         849    -26 (-2.97%)
> > > bpf_xdp.o      tail_handle_nat_fwd_ipv6              13515      13264     -251 (-1.86%)         715         702    -13 (-1.82%)
> > > bpf_xdp.o      tail_lb_ipv4                          39492      36367    -3125 (-7.91%)        2430        2251   -179 (-7.37%)
> > > bpf_xdp.o      tail_lb_ipv6                          80441      78058    -2383 (-2.96%)        3647        3523   -124 (-3.40%)
> > > bpf_xdp.o      tail_nodeport_ipv6_dsr                 1038        901    -137 (-13.20%)          61          55     -6 (-9.84%)
> > > bpf_xdp.o      tail_nodeport_nat_egress_ipv4         13027      12096     -931 (-7.15%)         868         809    -59 (-6.80%)
> > > bpf_xdp.o      tail_nodeport_nat_ingress_ipv4         7617       5900   -1717 (-22.54%)         522         413  -109 (-20.88%)
> > > bpf_xdp.o      tail_nodeport_nat_ingress_ipv6         7575       7395     -180 (-2.38%)         383         374     -9 (-2.35%)
> > > bpf_xdp.o      tail_rev_nodeport_lb4                  6808       6739      -69 (-1.01%)         403         396     -7 (-1.74%)
> > > bpf_xdp.o      tail_rev_nodeport_lb6                 16173      15847     -326 (-2.02%)        1010         990    -20 (-1.98%)
> > >
> >
> > So I also want to mention that while I did spot check a few programs
> > (not the biggest ones) and they did seem to have correct verification
> > flow, I obviously can't easily validate verifier log_level=2 logs for
> > all of the changes above, especially those multi-thousand state
> > programs. I'd really appreciate someone from Isovalent/Cilium to do
> > some checking of the Cilium program or two for sanity, just in case.
> > Thanks!
>
> fyi, I was curious so tried that on top of tetragon programs,
> seems up and down, but verification time is mostly lower ;-)
>

Nice! Can you please regenerate results and sort by either insn_diff
(absolute difference, not percentage), or states_diff? It would be
easier to see top10 improvement and regression that way. Percentages
by themselves can be misleading.

Oh, and peak states are probably not that useful, so maybe just use
`-e file,prog,duration,insns,states -s insns_diff`?

> jirka
>
>
> ---
> $ veristat --compare veristat.old veristat.new
>
> File                            Program                        Duration (us) (A)  Duration (us) (B)  Duration (us) (DIFF)  Insns (A)  Insns (B)  Insns     (DIFF)  States (A)  States (B)  States   (DIFF)  Peak states (A)  Peak states (B)  Peak states (DIFF)
> ------------------------------  -----------------------------  -----------------  -----------------  --------------------  ---------  ---------  ----------------  ----------  ----------  ---------------  ---------------  ---------------  ------------------
> bpf_cgroup_mkdir.o              tg_tp_cgrp_mkdir                             206                190          -16 (-7.77%)        581        581       +0 (+0.00%)          24          24      +0 (+0.00%)               24               24         +0 (+0.00%)
> bpf_cgroup_release.o            tg_tp_cgrp_release                           114                104          -10 (-8.77%)        381        381       +0 (+0.00%)          13          13      +0 (+0.00%)               13               13         +0 (+0.00%)
> bpf_cgroup_rmdir.o              tg_tp_cgrp_rmdir                             126                121           -5 (-3.97%)        381        381       +0 (+0.00%)          13          13      +0 (+0.00%)               13               13         +0 (+0.00%)
> bpf_execve_bprm_commit_creds.o  tg_kp_bprm_committing_creds                  100                 95           -5 (-5.00%)        163        163       +0 (+0.00%)          14          14      +0 (+0.00%)               14               14         +0 (+0.00%)
> bpf_execve_event.o              event_execve                               12147              12843         +696 (+5.73%)      35096      34723     -373 (-1.06%)        2278        2251     -27 (-1.19%)             1110             1115         +5 (+0.45%)
> bpf_execve_event.o              execve_send                                   93                 57         -36 (-38.71%)         82         82       +0 (+0.00%)           6           6      +0 (+0.00%)                6                6         +0 (+0.00%)
> bpf_execve_event_v53.o          event_execve                               97457              98430         +973 (+1.00%)     245365     239363    -6002 (-2.45%)       15430       15334     -96 (-0.62%)             7994             7929        -65 (-0.81%)
> bpf_execve_event_v53.o          execve_send                                   52                 54           +2 (+3.85%)        105        105       +0 (+0.00%)           5           5      +0 (+0.00%)                5                5         +0 (+0.00%)
> bpf_execve_event_v61.o          event_execve                                6094               6059          -35 (-0.57%)      27456      26871     -585 (-2.13%)         671         636     -35 (-5.22%)              301              309         +8 (+2.66%)
> bpf_execve_event_v61.o          execve_send                                   66                 69           +3 (+4.55%)        105        105       +0 (+0.00%)           5           5      +0 (+0.00%)                5                5         +0 (+0.00%)
> bpf_exit.o                      event_exit                                    65                 53         -12 (-18.46%)         94         94       +0 (+0.00%)           8           8      +0 (+0.00%)                8                8         +0 (+0.00%)
> bpf_fork.o                      event_wake_up_new_task                       179                209         +30 (+16.76%)        514        514       +0 (+0.00%)          30          30      +0 (+0.00%)               30               30         +0 (+0.00%)

[...]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
  2023-11-01 16:27       ` Andrii Nakryiko
@ 2023-11-02  9:54         ` Jiri Olsa
  0 siblings, 0 replies; 45+ messages in thread
From: Jiri Olsa @ 2023-11-02  9:54 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Andrii Nakryiko, bpf, ast, daniel, martin.lau,
	kernel-team

On Wed, Nov 01, 2023 at 09:27:21AM -0700, Andrii Nakryiko wrote:
> On Wed, Nov 1, 2023 at 12:56 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Mon, Oct 30, 2023 at 10:22:48PM -0700, Andrii Nakryiko wrote:
> > > On Mon, Oct 30, 2023 at 10:03 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > >
> > > > Now that precision backtracing is supporting register spill/fill to/from
> > > > stack, there is another oportunity to be exploited here: minimizing
> > > > precise STACK_ZERO cases. With a simple code change we can rely on
> > > > initially imprecise register spill tracking for cases when register
> > > > spilled to stack was a known zero.
> > > >
> > > > This is a very common case for initializing on the stack variables,
> > > > including rather large structures. Often times zero has no special
> > > > meaning for the subsequent BPF program logic and is often overwritten
> > > > with non-zero values soon afterwards. But due to STACK_ZERO vs
> > > > STACK_MISC tracking, such initial zero initialization actually causes
> > > > duplication of verifier states as STACK_ZERO is clearly different than
> > > > STACK_MISC or spilled SCALAR_VALUE register.
> > > >
> > > > The effect of this (now) trivial change is huge, as can be seen below.
> > > > These are differences between BPF selftests, Cilium, and Meta-internal
> > > > BPF object files relative to previous patch in this series. You can see
> > > > improvements ranging from single-digit percentage improvement for
> > > > instructions and states, all the way to 50-60% reduction for some of
> > > > Meta-internal host agent programs, and even some Cilium programs.
> > > >
> > > > For Meta-internal ones I left only the differences for largest BPF
> > > > object files by states/instructions, as there were too many differences
> > > > in the overall output. All the differences were improvements, reducting
> > > > number of states and thus instructions validated.
> > > >
> > > > Note, Meta-internal BPF object file names are not printed below.
> > > > Many copies of balancer_ingress are actually many different
> > > > configurations of Katran, so they are different BPF programs, which
> > > > explains state reduction going from -16% all the way to 31%, depending
> > > > on BPF program logic complexity.
> > > >
> > > > SELFTESTS
> > > > =========
> > > > File                                     Program                  Insns (A)  Insns (B)  Insns    (DIFF)  States (A)  States (B)  States (DIFF)
> > > > ---------------------------------------  -----------------------  ---------  ---------  ---------------  ----------  ----------  -------------
> > > > bpf_iter_netlink.bpf.linked3.o           dump_netlink                   148        104    -44 (-29.73%)           8           5   -3 (-37.50%)
> > > > bpf_iter_unix.bpf.linked3.o              dump_unix                     8474       8404     -70 (-0.83%)         151         147    -4 (-2.65%)
> > > > bpf_loop.bpf.linked3.o                   stack_check                    560        324   -236 (-42.14%)          42          24  -18 (-42.86%)
> > > > local_storage_bench.bpf.linked3.o        get_local                      120         77    -43 (-35.83%)           9           6   -3 (-33.33%)
> > > > loop6.bpf.linked3.o                      trace_virtqueue_add_sgs      10167       9868    -299 (-2.94%)         226         206   -20 (-8.85%)
> > > > pyperf600_bpf_loop.bpf.linked3.o         on_event                      4872       3423  -1449 (-29.74%)         322         229  -93 (-28.88%)
> > > > strobemeta.bpf.linked3.o                 on_event                    180697     176036   -4661 (-2.58%)        4780        4734   -46 (-0.96%)
> > > > test_cls_redirect.bpf.linked3.o          cls_redirect                 65594      65401    -193 (-0.29%)        4230        4212   -18 (-0.43%)
> > > > test_global_func_args.bpf.linked3.o      test_cls                       145        136      -9 (-6.21%)          10           9   -1 (-10.00%)
> > > > test_l4lb.bpf.linked3.o                  balancer_ingress              4760       2612  -2148 (-45.13%)         113         102   -11 (-9.73%)
> > > > test_l4lb_noinline.bpf.linked3.o         balancer_ingress              4845       4877     +32 (+0.66%)         219         221    +2 (+0.91%)
> > > > test_l4lb_noinline_dynptr.bpf.linked3.o  balancer_ingress              2072       2087     +15 (+0.72%)          97          98    +1 (+1.03%)
> > > > test_seg6_loop.bpf.linked3.o             __add_egr_x                  12440       9975  -2465 (-19.82%)         364         353   -11 (-3.02%)
> > > > test_tcp_hdr_options.bpf.linked3.o       estab                         2558       2572     +14 (+0.55%)         179         180    +1 (+0.56%)
> > > > test_xdp_dynptr.bpf.linked3.o            _xdp_tx_iptunnel               645        596     -49 (-7.60%)          26          24    -2 (-7.69%)
> > > > test_xdp_noinline.bpf.linked3.o          balancer_ingress_v6           3520       3516      -4 (-0.11%)         216         216    +0 (+0.00%)
> > > > xdp_synproxy_kern.bpf.linked3.o          syncookie_tc                 82661      81241   -1420 (-1.72%)        5073        5155   +82 (+1.62%)
> > > > xdp_synproxy_kern.bpf.linked3.o          syncookie_xdp                84964      82297   -2667 (-3.14%)        5130        5157   +27 (+0.53%)
> > > >
> > > > META-INTERNAL
> > > > =============
> > > > Program                                 Insns (A)  Insns (B)  Insns      (DIFF)  States (A)  States (B)  States   (DIFF)
> > > > --------------------------------------  ---------  ---------  -----------------  ----------  ----------  ---------------
> > > > balancer_ingress                            27925      23608    -4317 (-15.46%)        1488        1482      -6 (-0.40%)
> > > > balancer_ingress                            31824      27546    -4278 (-13.44%)        1658        1652      -6 (-0.36%)
> > > > balancer_ingress                            32213      27935    -4278 (-13.28%)        1689        1683      -6 (-0.36%)
> > > > balancer_ingress                            32213      27935    -4278 (-13.28%)        1689        1683      -6 (-0.36%)
> > > > balancer_ingress                            31824      27546    -4278 (-13.44%)        1658        1652      -6 (-0.36%)
> > > > balancer_ingress                            38647      29562    -9085 (-23.51%)        2069        1835   -234 (-11.31%)
> > > > balancer_ingress                            38647      29562    -9085 (-23.51%)        2069        1835   -234 (-11.31%)
> > > > balancer_ingress                            40339      30792    -9547 (-23.67%)        2193        1934   -259 (-11.81%)
> > > > balancer_ingress                            37321      29055    -8266 (-22.15%)        1972        1795    -177 (-8.98%)
> > > > balancer_ingress                            38176      29753    -8423 (-22.06%)        2008        1831    -177 (-8.81%)
> > > > balancer_ingress                            29193      20910    -8283 (-28.37%)        1599        1422   -177 (-11.07%)
> > > > balancer_ingress                            30013      21452    -8561 (-28.52%)        1645        1447   -198 (-12.04%)
> > > > balancer_ingress                            28691      24290    -4401 (-15.34%)        1545        1531     -14 (-0.91%)
> > > > balancer_ingress                            34223      28965    -5258 (-15.36%)        1984        1875    -109 (-5.49%)
> > > > balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> > > > balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> > > > balancer_ingress                            35868      26455    -9413 (-26.24%)        2140        1827   -313 (-14.63%)
> > > > balancer_ingress                            35868      26455    -9413 (-26.24%)        2140        1827   -313 (-14.63%)
> > > > balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> > > > balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
> > > > balancer_ingress                            34844      29485    -5359 (-15.38%)        2036        1918    -118 (-5.80%)
> > > > fbflow_egress                                3256       2652     -604 (-18.55%)         218         192    -26 (-11.93%)
> > > > fbflow_ingress                               1026        944       -82 (-7.99%)          70          63     -7 (-10.00%)
> > > > sslwall_tc_egress                            8424       7360    -1064 (-12.63%)         498         458     -40 (-8.03%)
> > > > syar_accept_protect                         15040       9539    -5501 (-36.58%)         364         220   -144 (-39.56%)
> > > > syar_connect_tcp_v6                         15036       9535    -5501 (-36.59%)         360         216   -144 (-40.00%)
> > > > syar_connect_udp_v4                         15039       9538    -5501 (-36.58%)         361         217   -144 (-39.89%)
> > > > syar_connect_connect4_protect4              24805      15833    -8972 (-36.17%)         756         480   -276 (-36.51%)
> > > > syar_lsm_file_open                         167772     151813    -15959 (-9.51%)        1836        1667    -169 (-9.20%)
> > > > syar_namespace_create_new                   14805       9304    -5501 (-37.16%)         353         209   -144 (-40.79%)
> > > > syar_python3_detect                         17531      12030    -5501 (-31.38%)         391         247   -144 (-36.83%)
> > > > syar_ssh_post_fork                          16412      10911    -5501 (-33.52%)         405         261   -144 (-35.56%)
> > > > syar_enter_execve                           14728       9227    -5501 (-37.35%)         345         201   -144 (-41.74%)
> > > > syar_enter_execveat                         14728       9227    -5501 (-37.35%)         345         201   -144 (-41.74%)
> > > > syar_exit_execve                            16622      11121    -5501 (-33.09%)         376         232   -144 (-38.30%)
> > > > syar_exit_execveat                          16622      11121    -5501 (-33.09%)         376         232   -144 (-38.30%)
> > > > syar_syscalls_kill                          15288       9787    -5501 (-35.98%)         398         254   -144 (-36.18%)
> > > > syar_task_enter_pivot_root                  14898       9397    -5501 (-36.92%)         357         213   -144 (-40.34%)
> > > > syar_syscalls_setreuid                      16678      11177    -5501 (-32.98%)         429         285   -144 (-33.57%)
> > > > syar_syscalls_setuid                        16678      11177    -5501 (-32.98%)         429         285   -144 (-33.57%)
> > > > syar_syscalls_process_vm_readv              14959       9458    -5501 (-36.77%)         364         220   -144 (-39.56%)
> > > > syar_syscalls_process_vm_writev             15757      10256    -5501 (-34.91%)         390         246   -144 (-36.92%)
> > > > do_uprobe                                   15519      10018    -5501 (-35.45%)         373         229   -144 (-38.61%)
> > > > edgewall                                   179715      55783  -123932 (-68.96%)       12607        3999  -8608 (-68.28%)
> > > > bictcp_state                                 7570       4131    -3439 (-45.43%)         496         269   -227 (-45.77%)
> > > > cubictcp_state                               7570       4131    -3439 (-45.43%)         496         269   -227 (-45.77%)
> > > > tcp_rate_skb_delivered                        447        272     -175 (-39.15%)          29          18    -11 (-37.93%)
> > > > kprobe__bbr_set_state                        4566       2615    -1951 (-42.73%)         209         124    -85 (-40.67%)
> > > > kprobe__bictcp_state                         4566       2615    -1951 (-42.73%)         209         124    -85 (-40.67%)
> > > > inet_sock_set_state                          1501       1337     -164 (-10.93%)          93          85      -8 (-8.60%)
> > > > tcp_retransmit_skb                           1145        981     -164 (-14.32%)          67          59     -8 (-11.94%)
> > > > tcp_retransmit_synack                        1183        951     -232 (-19.61%)          67          55    -12 (-17.91%)
> > > > bpf_tcptuner                                 1459       1187     -272 (-18.64%)          99          80    -19 (-19.19%)
> > > > tw_egress                                     801        776       -25 (-3.12%)          69          66      -3 (-4.35%)
> > > > tw_ingress                                    795        770       -25 (-3.14%)          69          66      -3 (-4.35%)
> > > > ttls_tc_ingress                             19025      19383      +358 (+1.88%)         470         465      -5 (-1.06%)
> > > > ttls_nat_egress                               490        299     -191 (-38.98%)          33          20    -13 (-39.39%)
> > > > ttls_nat_ingress                              448        285     -163 (-36.38%)          32          21    -11 (-34.38%)
> > > > tw_twfw_egress                             511127     212071  -299056 (-58.51%)       16733        8504  -8229 (-49.18%)
> > > > tw_twfw_ingress                            500095     212069  -288026 (-57.59%)       16223        8504  -7719 (-47.58%)
> > > > tw_twfw_tc_eg                              511113     212064  -299049 (-58.51%)       16732        8504  -8228 (-49.18%)
> > > > tw_twfw_tc_in                              500095     212069  -288026 (-57.59%)       16223        8504  -7719 (-47.58%)
> > > > tw_twfw_egress                              12632      12435      -197 (-1.56%)         276         260     -16 (-5.80%)
> > > > tw_twfw_ingress                             12631      12454      -177 (-1.40%)         278         261     -17 (-6.12%)
> > > > tw_twfw_tc_eg                               12595      12435      -160 (-1.27%)         274         259     -15 (-5.47%)
> > > > tw_twfw_tc_in                               12631      12454      -177 (-1.40%)         278         261     -17 (-6.12%)
> > > > tw_xdp_dump                                   266        209      -57 (-21.43%)           9           8     -1 (-11.11%)
> > > >
> > > > CILIUM
> > > > =========
> > > > File           Program                           Insns (A)  Insns (B)  Insns     (DIFF)  States (A)  States (B)  States  (DIFF)
> > > > -------------  --------------------------------  ---------  ---------  ----------------  ----------  ----------  --------------
> > > > bpf_host.o     cil_to_netdev                          6047       4578   -1469 (-24.29%)         362         249  -113 (-31.22%)
> > > > bpf_host.o     handle_lxc_traffic                     2227       1585    -642 (-28.83%)         156         103   -53 (-33.97%)
> > > > bpf_host.o     tail_handle_ipv4_from_netdev           2244       1458    -786 (-35.03%)         163         106   -57 (-34.97%)
> > > > bpf_host.o     tail_handle_nat_fwd_ipv4              21022      10479  -10543 (-50.15%)        1289         670  -619 (-48.02%)
> > > > bpf_host.o     tail_handle_nat_fwd_ipv6              15433      11375   -4058 (-26.29%)         905         643  -262 (-28.95%)
> > > > bpf_host.o     tail_ipv4_host_policy_ingress          2219       1367    -852 (-38.40%)         161          96   -65 (-40.37%)
> > > > bpf_host.o     tail_nodeport_nat_egress_ipv4         22460      19862   -2598 (-11.57%)        1469        1293  -176 (-11.98%)
> > > > bpf_host.o     tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
> > > > bpf_host.o     tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
> > > > bpf_host.o     tail_nodeport_nat_ipv6_egress          3702       3542     -160 (-4.32%)         215         205    -10 (-4.65%)
> > > > bpf_lxc.o      tail_handle_nat_fwd_ipv4              21022      10479  -10543 (-50.15%)        1289         670  -619 (-48.02%)
> > > > bpf_lxc.o      tail_handle_nat_fwd_ipv6              15433      11375   -4058 (-26.29%)         905         643  -262 (-28.95%)
> > > > bpf_lxc.o      tail_ipv4_ct_egress                    5073       3374   -1699 (-33.49%)         262         172   -90 (-34.35%)
> > > > bpf_lxc.o      tail_ipv4_ct_ingress                   5093       3385   -1708 (-33.54%)         262         172   -90 (-34.35%)
> > > > bpf_lxc.o      tail_ipv4_ct_ingress_policy_only       5093       3385   -1708 (-33.54%)         262         172   -90 (-34.35%)
> > > > bpf_lxc.o      tail_ipv6_ct_egress                    4593       3878    -715 (-15.57%)         194         151   -43 (-22.16%)
> > > > bpf_lxc.o      tail_ipv6_ct_ingress                   4606       3891    -715 (-15.52%)         194         151   -43 (-22.16%)
> > > > bpf_lxc.o      tail_ipv6_ct_ingress_policy_only       4606       3891    -715 (-15.52%)         194         151   -43 (-22.16%)
> > > > bpf_lxc.o      tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
> > > > bpf_lxc.o      tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
> > > > bpf_overlay.o  tail_handle_nat_fwd_ipv4              20524      10114  -10410 (-50.72%)        1271         638  -633 (-49.80%)
> > > > bpf_overlay.o  tail_nodeport_nat_egress_ipv4         22718      19490   -3228 (-14.21%)        1475        1275  -200 (-13.56%)
> > > > bpf_overlay.o  tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
> > > > bpf_overlay.o  tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
> > > > bpf_overlay.o  tail_nodeport_nat_ipv6_egress          3638       3548      -90 (-2.47%)         209         203     -6 (-2.87%)
> > > > bpf_overlay.o  tail_rev_nodeport_lb4                  4368       3820    -548 (-12.55%)         248         215   -33 (-13.31%)
> > > > bpf_overlay.o  tail_rev_nodeport_lb6                  2867       2428    -439 (-15.31%)         167         140   -27 (-16.17%)
> > > > bpf_sock.o     cil_sock6_connect                      1718       1703      -15 (-0.87%)         100          99     -1 (-1.00%)
> > > > bpf_xdp.o      tail_handle_nat_fwd_ipv4              12917      12443     -474 (-3.67%)         875         849    -26 (-2.97%)
> > > > bpf_xdp.o      tail_handle_nat_fwd_ipv6              13515      13264     -251 (-1.86%)         715         702    -13 (-1.82%)
> > > > bpf_xdp.o      tail_lb_ipv4                          39492      36367    -3125 (-7.91%)        2430        2251   -179 (-7.37%)
> > > > bpf_xdp.o      tail_lb_ipv6                          80441      78058    -2383 (-2.96%)        3647        3523   -124 (-3.40%)
> > > > bpf_xdp.o      tail_nodeport_ipv6_dsr                 1038        901    -137 (-13.20%)          61          55     -6 (-9.84%)
> > > > bpf_xdp.o      tail_nodeport_nat_egress_ipv4         13027      12096     -931 (-7.15%)         868         809    -59 (-6.80%)
> > > > bpf_xdp.o      tail_nodeport_nat_ingress_ipv4         7617       5900   -1717 (-22.54%)         522         413  -109 (-20.88%)
> > > > bpf_xdp.o      tail_nodeport_nat_ingress_ipv6         7575       7395     -180 (-2.38%)         383         374     -9 (-2.35%)
> > > > bpf_xdp.o      tail_rev_nodeport_lb4                  6808       6739      -69 (-1.01%)         403         396     -7 (-1.74%)
> > > > bpf_xdp.o      tail_rev_nodeport_lb6                 16173      15847     -326 (-2.02%)        1010         990    -20 (-1.98%)
> > > >
> > >
> > > So I also want to mention that while I did spot check a few programs
> > > (not the biggest ones) and they did seem to have correct verification
> > > flow, I obviously can't easily validate verifier log_level=2 logs for
> > > all of the changes above, especially those multi-thousand state
> > > programs. I'd really appreciate someone from Isovalent/Cilium to do
> > > some checking of the Cilium program or two for sanity, just in case.
> > > Thanks!
> >
> > fyi, I was curious so tried that on top of tetragon programs,
> > seems up and down, but verification time is mostly lower ;-)
> >
> 
> Nice! Can you please regenerate results and sort by either insn_diff
> (absolute difference, not percentage), or states_diff? It would be
> easier to see top10 improvement and regression that way. Percentages
> by themselves can be misleading.
> 
> Oh, and peak states are probably not that useful, so maybe just use
> `-e file,prog,duration,insns,states -s insns_diff`?

$ veristat --compare ./veristat.old ./veristat.new -e file,prog,duration,insns,states --sort insns_diff

File                            Program                        Duration (us) (A)  Duration (us) (B)  Duration (us) (DIFF)  Insns (A)  Insns (B)  Insns     (DIFF)  States (A)  States (B)  States   (DIFF)
------------------------------  -----------------------------  -----------------  -----------------  --------------------  ---------  ---------  ----------------  ----------  ----------  ---------------
bpf_generic_kprobe_v61.o        generic_kprobe_process_event4              11141              12815       +1674 (+15.03%)      58981      74350  +15369 (+26.06%)        1292        1522   +230 (+17.80%)
bpf_multi_kprobe_v61.o          generic_kprobe_process_event4              14745              14029         -716 (-4.86%)      58981      74350  +15369 (+26.06%)        1292        1522   +230 (+17.80%)
bpf_generic_kprobe_v53.o        generic_kprobe_process_event4             106100             111486        +5386 (+5.08%)     296244     308555   +12311 (+4.16%)       16249       16386    +137 (+0.84%)
bpf_multi_kprobe_v53.o          generic_kprobe_process_event4             121800             111903        -9897 (-8.13%)     296244     308555   +12311 (+4.16%)       16249       16386    +137 (+0.84%)
bpf_generic_kprobe_v61.o        generic_kprobe_process_event3              13016              15029       +2013 (+15.47%)      68447      75715   +7268 (+10.62%)        1477        1565     +88 (+5.96%)
bpf_multi_kprobe_v61.o          generic_kprobe_process_event3              14824              13829         -995 (-6.71%)      68447      75715   +7268 (+10.62%)        1477        1565     +88 (+5.96%)
bpf_generic_kprobe_v61.o        generic_kprobe_process_event1              12683              14576       +1893 (+14.93%)      68450      75716   +7266 (+10.62%)        1477        1566     +89 (+6.03%)
bpf_multi_kprobe_v61.o          generic_kprobe_process_event1              16763              13670       -3093 (-18.45%)      68450      75716   +7266 (+10.62%)        1477        1566     +89 (+6.03%)
bpf_generic_kprobe_v61.o        generic_kprobe_process_event2              12822              14709       +1887 (+14.72%)      68450      75715   +7265 (+10.61%)        1477        1566     +89 (+6.03%)
bpf_multi_kprobe_v61.o          generic_kprobe_process_event2              14321              14000         -321 (-2.24%)      68450      75715   +7265 (+10.61%)        1477        1566     +89 (+6.03%)
bpf_generic_kprobe_v53.o        generic_kprobe_process_event1             108349             106105        -2244 (-2.07%)     313458     315263    +1805 (+0.58%)       16524       16544     +20 (+0.12%)
bpf_generic_kprobe_v53.o        generic_kprobe_process_event2             109991             105951        -4040 (-3.67%)     313458     315263    +1805 (+0.58%)       16524       16544     +20 (+0.12%)
bpf_generic_kprobe_v53.o        generic_kprobe_process_event3             110279             109525         -754 (-0.68%)     313455     315260    +1805 (+0.58%)       16524       16544     +20 (+0.12%)
bpf_multi_kprobe_v53.o          generic_kprobe_process_event1             132058             106791      -25267 (-19.13%)     313458     315263    +1805 (+0.58%)       16524       16544     +20 (+0.12%)
bpf_multi_kprobe_v53.o          generic_kprobe_process_event2             122505             106459      -16046 (-13.10%)     313458     315263    +1805 (+0.58%)       16524       16544     +20 (+0.12%)
bpf_multi_kprobe_v53.o          generic_kprobe_process_event3             127258             106633      -20625 (-16.21%)     313455     315260    +1805 (+0.58%)       16524       16544     +20 (+0.12%)
bpf_cgroup_mkdir.o              tg_tp_cgrp_mkdir                             206                190          -16 (-7.77%)        581        581       +0 (+0.00%)          24          24      +0 (+0.00%)
bpf_cgroup_release.o            tg_tp_cgrp_release                           114                104          -10 (-8.77%)        381        381       +0 (+0.00%)          13          13      +0 (+0.00%)
bpf_cgroup_rmdir.o              tg_tp_cgrp_rmdir                             126                121           -5 (-3.97%)        381        381       +0 (+0.00%)          13          13      +0 (+0.00%)
bpf_execve_bprm_commit_creds.o  tg_kp_bprm_committing_creds                  100                 95           -5 (-5.00%)        163        163       +0 (+0.00%)          14          14      +0 (+0.00%)
bpf_execve_event.o              execve_send                                   93                 57         -36 (-38.71%)         82         82       +0 (+0.00%)           6           6      +0 (+0.00%)
bpf_execve_event_v53.o          execve_send                                   52                 54           +2 (+3.85%)        105        105       +0 (+0.00%)           5           5      +0 (+0.00%)
bpf_execve_event_v61.o          execve_send                                   66                 69           +3 (+4.55%)        105        105       +0 (+0.00%)           5           5      +0 (+0.00%)
bpf_exit.o                      event_exit                                    65                 53         -12 (-18.46%)         94         94       +0 (+0.00%)           8           8      +0 (+0.00%)
bpf_fork.o                      event_wake_up_new_task                       179                209         +30 (+16.76%)        514        514       +0 (+0.00%)          30          30      +0 (+0.00%)
bpf_generic_kprobe.o            generic_fmodret_override                      67                 70           +3 (+4.48%)         18         18       +0 (+0.00%)           2           2      +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_actions                      2386               1893        -493 (-20.66%)       6746       6746       +0 (+0.00%)         287         287      +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_event                         302                306           +4 (+1.32%)        580        580       +0 (+0.00%)          47          47      +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_filter_arg1                  2679               2464         -215 (-8.03%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_filter_arg2                  2487               2777        +290 (+11.66%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_filter_arg3                  2905               2620         -285 (-9.81%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_filter_arg4                  2834               2706         -128 (-4.52%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_filter_arg5                  2771               2621         -150 (-5.41%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_output                         44                 41           -3 (-6.82%)         29         29       +0 (+0.00%)           2           2      +0 (+0.00%)
bpf_generic_kprobe.o            generic_kprobe_override                       40                 39           -1 (-2.50%)         20         20       +0 (+0.00%)           2           2      +0 (+0.00%)
bpf_generic_kprobe_v53.o        generic_fmodret_override                      64                 66           +2 (+3.12%)         18         18       +0 (+0.00%)           2           2      +0 (+0.00%)
bpf_generic_kprobe_v53.o        generic_kprobe_actions                     23258              14115       -9143 (-39.31%)      42545      42545       +0 (+0.00%)        1434        1434      +0 (+0.00%)
bpf_generic_kprobe_v53.o        generic_kprobe_event                         298                303           +5 (+1.68%)        583        583       +0 (+0.00%)          47          47      +0 (+0.00%)
bpf_generic_kprobe_v53.o        generic_kprobe_output                        119                148         +29 (+24.37%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)
bpf_generic_kprobe_v53.o        generic_kprobe_override                       38                 39           +1 (+2.63%)         20         20       +0 (+0.00%)           2           2      +0 (+0.00%)
bpf_generic_kprobe_v61.o        generic_fmodret_override                      94                 89           -5 (-5.32%)         18         18       +0 (+0.00%)           2           2      +0 (+0.00%)
bpf_generic_kprobe_v61.o        generic_kprobe_actions                     15903              15072         -831 (-5.23%)      42545      42545       +0 (+0.00%)        1434        1434      +0 (+0.00%)
bpf_generic_kprobe_v61.o        generic_kprobe_event                         303                340         +37 (+12.21%)        583        583       +0 (+0.00%)          47          47      +0 (+0.00%)
bpf_generic_kprobe_v61.o        generic_kprobe_output                        153                149           -4 (-2.61%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)
bpf_generic_kprobe_v61.o        generic_kprobe_override                       56                 51           -5 (-8.93%)         20         20       +0 (+0.00%)           2           2      +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_actions                  2259               1998        -261 (-11.55%)       6692       6692       +0 (+0.00%)         295         295      +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_arg1                     2523               2569          +46 (+1.82%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_arg2                     2853               2692         -161 (-5.64%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_arg3                     2522               2902        +380 (+15.07%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_arg4                     2538               2837        +299 (+11.78%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_arg5                     2598               2640          +42 (+1.62%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_event                     691                617         -74 (-10.71%)       1487       1487       +0 (+0.00%)          92          92      +0 (+0.00%)
bpf_generic_tracepoint.o        generic_tracepoint_output                     41                 36          -5 (-12.20%)         29         29       +0 (+0.00%)           2           2      +0 (+0.00%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_actions                 15139              14536         -603 (-3.98%)      41191      41191       +0 (+0.00%)        1397        1397      +0 (+0.00%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_event                     581                591          +10 (+1.72%)       1490       1490       +0 (+0.00%)          92          92      +0 (+0.00%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_output                    136                136           +0 (+0.00%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_actions                 16298              14731        -1567 (-9.61%)      41191      41191       +0 (+0.00%)        1397        1397      +0 (+0.00%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_event                     555                531          -24 (-4.32%)       1490       1490       +0 (+0.00%)          92          92      +0 (+0.00%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_output                    120                141         +21 (+17.50%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_actions                      1767               1928         +161 (+9.11%)       5702       5702       +0 (+0.00%)         248         248      +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_event                         232                207         -25 (-10.78%)        429        429       +0 (+0.00%)          33          33      +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_filter_arg1                  2764               2832          +68 (+2.46%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_filter_arg2                  2639               2675          +36 (+1.36%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_filter_arg3                  3875               2529       -1346 (-34.74%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_filter_arg4                  2646               2540         -106 (-4.01%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_filter_arg5                  2510               2674         +164 (+6.53%)       6966       6966       +0 (+0.00%)         451         451      +0 (+0.00%)
bpf_generic_uprobe.o            generic_uprobe_output                         41                 39           -2 (-4.88%)         29         29       +0 (+0.00%)           2           2      +0 (+0.00%)
bpf_generic_uprobe_v53.o        generic_uprobe_actions                     14216              14310          +94 (+0.66%)      39443      39443       +0 (+0.00%)        1336        1336      +0 (+0.00%)
bpf_generic_uprobe_v53.o        generic_uprobe_event                         236                223          -13 (-5.51%)        433        433       +0 (+0.00%)          33          33      +0 (+0.00%)
bpf_generic_uprobe_v53.o        generic_uprobe_output                        148                144           -4 (-2.70%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)
bpf_generic_uprobe_v61.o        generic_uprobe_actions                     14050              14958         +908 (+6.46%)      39443      39443       +0 (+0.00%)        1336        1336      +0 (+0.00%)
bpf_generic_uprobe_v61.o        generic_uprobe_event                         241                309         +68 (+28.22%)        433        433       +0 (+0.00%)          33          33      +0 (+0.00%)
bpf_generic_uprobe_v61.o        generic_uprobe_output                        138                146           +8 (+5.80%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)
bpf_globals.o                   read_globals_test                              0                  0           +0 (+0.00%)          0          0       +0 (+0.00%)           0           0      +0 (+0.00%)
bpf_killer.o                    killer                                        27                 28           +1 (+3.70%)         33         33       +0 (+0.00%)           3           3      +0 (+0.00%)
bpf_loader.o                    loader_kprobe                                 84                 82           -2 (-2.38%)        144        144       +0 (+0.00%)          10          10      +0 (+0.00%)
bpf_lseek.o                     test_lseek                                    54                 41         -13 (-24.07%)         67         67       +0 (+0.00%)           2           2      +0 (+0.00%)
bpf_multi_killer.o              killer                                        22                 22           +0 (+0.00%)         33         33       +0 (+0.00%)           3           3      +0 (+0.00%)
bpf_multi_kprobe_v53.o          generic_fmodret_override                     108                 73         -35 (-32.41%)         18         18       +0 (+0.00%)           2           2      +0 (+0.00%)
bpf_multi_kprobe_v53.o          generic_kprobe_actions                     29346              14095      -15251 (-51.97%)      42545      42545       +0 (+0.00%)        1434        1434      +0 (+0.00%)
bpf_multi_kprobe_v53.o          generic_kprobe_event                         339                345           +6 (+1.77%)        585        585       +0 (+0.00%)          48          48      +0 (+0.00%)
bpf_multi_kprobe_v53.o          generic_kprobe_output                        185                128         -57 (-30.81%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)
bpf_multi_kprobe_v53.o          generic_kprobe_override                       62                 41         -21 (-33.87%)         20         20       +0 (+0.00%)           2           2      +0 (+0.00%)
bpf_multi_kprobe_v61.o          generic_fmodret_override                      71                 91         +20 (+28.17%)         18         18       +0 (+0.00%)           2           2      +0 (+0.00%)
bpf_multi_kprobe_v61.o          generic_kprobe_actions                     16654              15088        -1566 (-9.40%)      42545      42545       +0 (+0.00%)        1434        1434      +0 (+0.00%)
bpf_multi_kprobe_v61.o          generic_kprobe_event                         517                278        -239 (-46.23%)        585        585       +0 (+0.00%)          48          48      +0 (+0.00%)
bpf_multi_kprobe_v61.o          generic_kprobe_output                        153                150           -3 (-1.96%)        252        252       +0 (+0.00%)          19          19      +0 (+0.00%)
bpf_multi_kprobe_v61.o          generic_kprobe_override                       40                 51         +11 (+27.50%)         20         20       +0 (+0.00%)           2           2      +0 (+0.00%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_event4                  97822              89913        -7909 (-8.09%)     215757     215704      -53 (-0.02%)       14951       14942      -9 (-0.06%)
bpf_generic_uprobe_v53.o        generic_uprobe_process_event4              99896              96233        -3663 (-3.67%)     215757     215704      -53 (-0.02%)       14951       14942      -9 (-0.06%)
bpf_generic_kprobe.o            generic_kprobe_process_event3               7581               7024         -557 (-7.35%)      19779      19680      -99 (-0.50%)        1348        1338     -10 (-0.74%)
bpf_generic_tracepoint.o        generic_tracepoint_event3                   7296               7587         +291 (+3.99%)      19779      19680      -99 (-0.50%)        1348        1338     -10 (-0.74%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_event0                  94250              96057        +1807 (+1.92%)     215685     215586      -99 (-0.05%)       14954       14938     -16 (-0.11%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_event1                  93947              95801        +1854 (+1.97%)     215701     215602      -99 (-0.05%)       14955       14941     -14 (-0.09%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_event2                  96306              95407         -899 (-0.93%)     215701     215602      -99 (-0.05%)       14955       14941     -14 (-0.09%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_event3                  97718              90734        -6984 (-7.15%)     215698     215599      -99 (-0.05%)       14955       14941     -14 (-0.09%)
bpf_generic_uprobe.o            generic_uprobe_process_event3               8127               6999       -1128 (-13.88%)      19779      19680      -99 (-0.50%)        1348        1338     -10 (-0.74%)
bpf_generic_uprobe_v53.o        generic_uprobe_process_event1             104517              90211      -14306 (-13.69%)     215701     215602      -99 (-0.05%)       14955       14941     -14 (-0.09%)
bpf_generic_uprobe_v53.o        generic_uprobe_process_event2             101025              90027      -10998 (-10.89%)     215701     215602      -99 (-0.05%)       14955       14941     -14 (-0.09%)
bpf_generic_uprobe_v53.o        generic_uprobe_process_event3              99776              95596        -4180 (-4.19%)     215698     215599      -99 (-0.05%)       14955       14941     -14 (-0.09%)
bpf_generic_kprobe.o            generic_kprobe_process_event1               7239               7468         +229 (+3.16%)      19782      19681     -101 (-0.51%)        1348        1339      -9 (-0.67%)
bpf_generic_tracepoint.o        generic_tracepoint_event1                   7347               9822       +2475 (+33.69%)      19782      19681     -101 (-0.51%)        1348        1339      -9 (-0.67%)
bpf_generic_uprobe.o            generic_uprobe_process_event1               8326               8041         -285 (-3.42%)      19782      19681     -101 (-0.51%)        1348        1339      -9 (-0.67%)
bpf_generic_kprobe.o            generic_kprobe_process_event2               7415               7691         +276 (+3.72%)      19782      19680     -102 (-0.52%)        1348        1339      -9 (-0.67%)
bpf_generic_kprobe.o            generic_kprobe_process_event4               8016               7572         -444 (-5.54%)      19760      19658     -102 (-0.52%)        1355        1344     -11 (-0.81%)
bpf_generic_tracepoint.o        generic_tracepoint_event2                   7218               7804         +586 (+8.12%)      19782      19680     -102 (-0.52%)        1348        1339      -9 (-0.67%)
bpf_generic_tracepoint.o        generic_tracepoint_event4                   7215               8109        +894 (+12.39%)      19760      19658     -102 (-0.52%)        1355        1344     -11 (-0.81%)
bpf_generic_uprobe.o            generic_uprobe_process_event2               8183               7016       -1167 (-14.26%)      19782      19680     -102 (-0.52%)        1348        1339      -9 (-0.67%)
bpf_generic_uprobe.o            generic_uprobe_process_event4               8072               7185        -887 (-10.99%)      19760      19658     -102 (-0.52%)        1355        1344     -11 (-0.81%)
bpf_generic_tracepoint.o        generic_tracepoint_event0                   7566               8026         +460 (+6.08%)      20592      20479     -113 (-0.55%)        1421        1409     -12 (-0.84%)
bpf_generic_kprobe_v53.o        generic_kprobe_process_event0             102334             101040        -1294 (-1.26%)     283295     283172     -123 (-0.04%)       16044       16033     -11 (-0.07%)
bpf_multi_kprobe_v53.o          generic_kprobe_process_event0             113628             100702      -12926 (-11.38%)     283295     283172     -123 (-0.04%)       16044       16033     -11 (-0.07%)
bpf_generic_uprobe.o            generic_uprobe_process_event0               7804               8154         +350 (+4.48%)      21063      20890     -173 (-0.82%)        1419        1400     -19 (-1.34%)
bpf_generic_retkprobe_v53.o     generic_retkprobe_event                   108357             105058        -3299 (-3.04%)     231680     231505     -175 (-0.08%)       16131       16113     -18 (-0.11%)
bpf_multi_retkprobe_v53.o       generic_retkprobe_event                   127625             110224      -17401 (-13.63%)     231631     231456     -175 (-0.08%)       16130       16112     -18 (-0.11%)
bpf_generic_retkprobe_v61.o     generic_retkprobe_event                    10694              11197         +503 (+4.70%)      24960      24775     -185 (-0.74%)        1854        1842     -12 (-0.65%)
bpf_generic_uprobe_v53.o        generic_uprobe_process_event0             103254              90496      -12758 (-12.36%)     215852     215620     -232 (-0.11%)       14972       14952     -20 (-0.13%)
bpf_generic_uprobe_v61.o        generic_uprobe_process_event0               2194               2133          -61 (-2.78%)       4395       4152     -243 (-5.53%)         329         312     -17 (-5.17%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_event4                   2094               1910         -184 (-8.79%)       4396       4124     -272 (-6.19%)         323         304     -19 (-5.88%)
bpf_generic_uprobe_v61.o        generic_uprobe_process_event4               1950               2031          +81 (+4.15%)       4396       4124     -272 (-6.19%)         323         304     -19 (-5.88%)
bpf_generic_retkprobe.o         generic_retkprobe_event                    11526              11239         -287 (-2.49%)      28282      28008     -274 (-0.97%)        1973        1949     -24 (-1.22%)
bpf_multi_retkprobe_v61.o       generic_retkprobe_event                    12110               9753       -2357 (-19.46%)      24404      24110     -294 (-1.20%)        1859        1841     -18 (-0.97%)
bpf_generic_kprobe_v53.o        generic_kprobe_filter_arg1                 25215              26076         +861 (+3.41%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_kprobe_v53.o        generic_kprobe_filter_arg2                 24813              24288         -525 (-2.12%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_kprobe_v53.o        generic_kprobe_filter_arg3                 26494              24362        -2132 (-8.05%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_kprobe_v53.o        generic_kprobe_filter_arg4                 24373              24041         -332 (-1.36%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_kprobe_v53.o        generic_kprobe_filter_arg5                 26265              24317        -1948 (-7.42%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_kprobe_v61.o        generic_kprobe_filter_arg1                 25870              24169        -1701 (-6.58%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_kprobe_v61.o        generic_kprobe_filter_arg2                 26667              24070        -2597 (-9.74%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_kprobe_v61.o        generic_kprobe_filter_arg3                 27248              24758        -2490 (-9.14%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_kprobe_v61.o        generic_kprobe_filter_arg4                 27483              26107        -1376 (-5.01%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_kprobe_v61.o        generic_kprobe_filter_arg5                 26764              26316         -448 (-1.67%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_arg1                    26569              23775       -2794 (-10.52%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_arg2                    26853              24057       -2796 (-10.41%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_arg3                    27067              24044       -3023 (-11.17%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_arg4                    24410              23953         -457 (-1.87%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_arg5                    30439              24792       -5647 (-18.55%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_arg1                    27534              23721       -3813 (-13.85%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_arg2                    28248              24052       -4196 (-14.85%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_arg3                    29118              24012       -5106 (-17.54%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_arg4                    33309              23915       -9394 (-28.20%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_arg5                    28057              24983       -3074 (-10.96%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_uprobe_v53.o        generic_uprobe_filter_arg1                 28012              26052        -1960 (-7.00%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_uprobe_v53.o        generic_uprobe_filter_arg2                 27759              26451        -1308 (-4.71%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_uprobe_v53.o        generic_uprobe_filter_arg3                 27301              25856        -1445 (-5.29%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_uprobe_v53.o        generic_uprobe_filter_arg4                 26331              26187         -144 (-0.55%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_uprobe_v53.o        generic_uprobe_filter_arg5                 27284              26122        -1162 (-4.26%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_uprobe_v61.o        generic_uprobe_filter_arg1                 30324              26943       -3381 (-11.15%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_uprobe_v61.o        generic_uprobe_filter_arg2                 26755              26758           +3 (+0.01%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_uprobe_v61.o        generic_uprobe_filter_arg3                 28337              27992         -345 (-1.22%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_uprobe_v61.o        generic_uprobe_filter_arg4                 26332              27308         +976 (+3.71%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_uprobe_v61.o        generic_uprobe_filter_arg5                 27209              26780         -429 (-1.58%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_multi_kprobe_v53.o          generic_kprobe_filter_arg1                 33490              23550       -9940 (-29.68%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_multi_kprobe_v53.o          generic_kprobe_filter_arg2                 42586              24318      -18268 (-42.90%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_multi_kprobe_v53.o          generic_kprobe_filter_arg3                 39256              24731      -14525 (-37.00%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_multi_kprobe_v53.o          generic_kprobe_filter_arg4                 41607              23955      -17652 (-42.43%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_multi_kprobe_v53.o          generic_kprobe_filter_arg5                 49382              24518      -24864 (-50.35%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_multi_kprobe_v61.o          generic_kprobe_filter_arg1                 41140              26793      -14347 (-34.87%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_multi_kprobe_v61.o          generic_kprobe_filter_arg2                 30326              26454       -3872 (-12.77%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_multi_kprobe_v61.o          generic_kprobe_filter_arg3                 38517              24452      -14065 (-36.52%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_multi_kprobe_v61.o          generic_kprobe_filter_arg4                 36157              24539      -11618 (-32.13%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_multi_kprobe_v61.o          generic_kprobe_filter_arg5                 40673              25657      -15016 (-36.92%)      91872      91575     -297 (-0.32%)        2910        2900     -10 (-0.34%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_event0                   2128               2058          -70 (-3.29%)       4403       4100     -303 (-6.88%)         326         305     -21 (-6.44%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_event1                   1982               2028          +46 (+2.32%)       4409       4106     -303 (-6.87%)         328         304     -24 (-7.32%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_event2                   2357               2054        -303 (-12.86%)       4409       4106     -303 (-6.87%)         328         304     -24 (-7.32%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_event3                   2018               1835         -183 (-9.07%)       4406       4103     -303 (-6.88%)         328         304     -24 (-7.32%)
bpf_generic_uprobe_v61.o        generic_uprobe_process_event1               1885               1832          -53 (-2.81%)       4409       4106     -303 (-6.87%)         328         304     -24 (-7.32%)
bpf_generic_uprobe_v61.o        generic_uprobe_process_event2               2775               1966        -809 (-29.15%)       4409       4106     -303 (-6.87%)         328         304     -24 (-7.32%)
bpf_generic_uprobe_v61.o        generic_uprobe_process_event3               3237               2004       -1233 (-38.09%)       4406       4103     -303 (-6.88%)         328         304     -24 (-7.32%)
bpf_generic_kprobe.o            generic_kprobe_process_event0               7817               7945         +128 (+1.64%)      21321      21001     -320 (-1.50%)        1440        1403     -37 (-2.57%)
bpf_execve_event.o              event_execve                               12147              12843         +696 (+5.73%)      35096      34723     -373 (-1.06%)        2278        2251     -27 (-1.19%)
bpf_execve_event_v61.o          event_execve                                6094               6059          -35 (-0.57%)      27456      26871     -585 (-2.13%)         671         636     -35 (-5.22%)
bpf_execve_event_v53.o          event_execve                               97457              98430         +973 (+1.00%)     245365     239363    -6002 (-2.45%)       15430       15334     -96 (-0.62%)
bpf_generic_kprobe_v53.o        generic_kprobe_process_filter              57465              54691        -2774 (-4.83%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)
bpf_generic_kprobe_v61.o        generic_kprobe_process_filter              57674              51652       -6022 (-10.44%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)
bpf_generic_tracepoint_v53.o    generic_tracepoint_filter                  64076              50012      -14064 (-21.95%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)
bpf_generic_tracepoint_v61.o    generic_tracepoint_filter                  63620              50068      -13552 (-21.30%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)
bpf_generic_uprobe_v53.o        generic_uprobe_process_filter              65621              56496       -9125 (-13.91%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)
bpf_generic_uprobe_v61.o        generic_uprobe_process_filter              62774              56727        -6047 (-9.63%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)
bpf_multi_kprobe_v53.o          generic_kprobe_process_filter              73918              54826      -19092 (-25.83%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)
bpf_multi_kprobe_v61.o          generic_kprobe_process_filter              73994              54979      -19015 (-25.70%)     166600     158639    -7961 (-4.78%)        7263        6602    -661 (-9.10%)
bpf_generic_kprobe_v61.o        generic_kprobe_process_event0              11184              10303         -881 (-7.88%)      58564      49822   -8742 (-14.93%)        1243        1108   -135 (-10.86%)
bpf_multi_kprobe_v61.o          generic_kprobe_process_event0              17270               9818       -7452 (-43.15%)      58564      49822   -8742 (-14.93%)        1243        1108   -135 (-10.86%)
bpf_generic_kprobe.o            generic_kprobe_process_filter              43093              31779      -11314 (-26.25%)      77948      66684  -11264 (-14.45%)        6048        5009  -1039 (-17.18%)
bpf_generic_tracepoint.o        generic_tracepoint_filter                  41153              33891       -7262 (-17.65%)      77948      66684  -11264 (-14.45%)        6048        5009  -1039 (-17.18%)
bpf_generic_uprobe.o            generic_uprobe_process_filter              40999              31572       -9427 (-22.99%)      77948      66684  -11264 (-14.45%)        6048        5009  -1039 (-17.18%)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
  2023-10-31  5:03 ` [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states Andrii Nakryiko
@ 2023-11-09 15:20   ` Eduard Zingerman
  2023-11-09 16:13     ` Alexei Starovoitov
  0 siblings, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 15:20 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team

On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> Instead of allocating and copying jump history each time we enqueue
> child verifier state, switch to a model where we use one common
> dynamically sized array of instruction jumps across all states.
> 
> The key observation for proving this is correct is that jmp_history is
> only relevant while state is active, which means it either is a current
> state (and thus we are actively modifying jump history and no other
> state can interfere with us) or we are checkpointed state with some
> children still active (either enqueued or being current).
> 
> In the latter case our portion of jump history is finalized and won't
> change or grow, so as long as we keep it immutable until the state is
> finalized, we are good.
> 
> Now, when state is finalized and is put into state hash for potentially
> future pruning lookups, jump history is not used anymore. This is
> because jump history is only used by precision marking logic, and we
> never modify precision markings for finalized states.
> 
> So, instead of each state having its own small jump history, we keep
> a global dynamically-sized jump history, where each state in current DFS
> path from root to active state remembers its portion of jump history.
> Current state can append to this history, but cannot modify any of its
> parent histories.
> 
> Because the jmp_history array can be grown through realloc, states don't
> keep pointers, they instead maintain two indexes [start, end) into
> global jump history array. End is exclusive index, so start == end means
> there is no relevant jump history.
> 
> This should eliminate a lot of allocations and minimize overall memory
> usage (but I haven't benchmarked on real hardware, and QEMU benchmarking
> is too noisy).
> 
> Also, in the next patch we'll extend jump history to maintain additional
> markings for some instructions even if there was no jump, so in
> preparation for that call this thing a more generic "instruction history".
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Nitpick: could you please add a comment somewhere in the code
(is_state_visited? pop_stack?) saying something like this:

  states in the env->head happen to be sorted by insn_hist_end in
  descending order, so popping next state for verification poses no
  risk of overwriting history relevant for states remaining in
  env->head.

Side note: this change would make it harder to change states traversal
order to something other than DFS, should we chose to do so.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking
  2023-10-31  5:03 ` [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking Andrii Nakryiko
@ 2023-11-09 15:20   ` Eduard Zingerman
  2023-11-09 17:20     ` Andrii Nakryiko
  0 siblings, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 15:20 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team, Tao Lyu

On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:

All makes sense, a few nitpicks below.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> +/* instruction history flags, used in bpf_insn_hist_entry.flags field */
> +enum {
> +	/* instruction references stack slot through PTR_TO_STACK register;
> +	 * we also store stack's frame number in lower 3 bits (MAX_CALL_FRAMES is 8)
> +	 * and accessed stack slot's index in next 6 bits (MAX_BPF_STACK is 512,
> +	 * 8 bytes per slot, so slot index (spi) is [0, 63])
> +	 */
> +	INSN_F_FRAMENO_MASK = 0x7, /* 3 bits */
> +
> +	INSN_F_SPI_MASK = 0x3f, /* 6 bits */
> +	INSN_F_SPI_SHIFT = 3, /* shifted 3 bits to the left */
> +
> +	INSN_F_STACK_ACCESS = BIT(9), /* we need 10 bits total */
> +};
> +
> +static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES);
> +static_assert(INSN_F_SPI_MASK + 1 >= MAX_BPF_STACK / 8);
> +
>  struct bpf_insn_hist_entry {
> -	u32 prev_idx;
>  	u32 idx;
> +	/* insn idx can't be bigger than 1 million */
> +	u32 prev_idx : 22;
> +	/* special flags, e.g., whether insn is doing register stack spill/load */
> +	u32 flags : 10;
>  };

Nitpick: maybe use separate bit-fields for frameno and spi instead of
         flags? Or add dedicated accessor functions?

>  
> -#define MAX_CALL_FRAMES 8
>  /* Maximum number of register states that can exist at once */
>  #define BPF_ID_MAP_SIZE ((MAX_BPF_REG + MAX_BPF_STACK / BPF_REG_SIZE) * MAX_CALL_FRAMES)
>  struct bpf_verifier_state {
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 2905ce2e8b34..fbb779583d52 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3479,14 +3479,20 @@ static bool is_jmp_point(struct bpf_verifier_env *env, int insn_idx)
>  }
>  
>  /* for any branch, call, exit record the history of jmps in the given state */
> -static int push_jmp_history(struct bpf_verifier_env *env,
> -			    struct bpf_verifier_state *cur)
> +static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
> +			     int insn_flags)
>  {
>  	struct bpf_insn_hist_entry *p;
>  	size_t alloc_size;
>  
> -	if (!is_jmp_point(env, env->insn_idx))
> +	/* combine instruction flags if we already recorded this instruction */
> +	if (cur->insn_hist_end > cur->insn_hist_start &&
> +	    (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
> +	    p->idx == env->insn_idx &&
> +	    p->prev_idx == env->prev_insn_idx) {
> +		p->flags |= insn_flags;

Nitpick: maybe add an assert to check that frameno/spi are not or'ed?

[...]

> +static struct bpf_insn_hist_entry *get_hist_insn_entry(struct bpf_verifier_env *env,
> +						       u32 hist_start, u32 hist_end, int insn_idx)

Nitpick: maybe rename 'hist_insn' to 'insn_hist', i.e. 'get_insn_hist_entry'?

[...]

> @@ -4713,9 +4711,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>  
>  		/* Mark slots affected by this stack write. */
>  		for (i = 0; i < size; i++)
> -			state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] =
> -				type;
> +			state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = type;
> +		insn_flags = 0; /* not a register spill */
>  	}
> +
> +	if (insn_flags)
> +		return push_insn_history(env, env->cur_state, insn_flags);

Maybe add a check that insn is BPF_ST or BPF_STX here?
Only these cases are supported by backtrack_insn() while
check_mem_access() is called from multiple places.

>  	return 0;
>  }
>  
> @@ -4908,6 +4909,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
>  	int i, slot = -off - 1, spi = slot / BPF_REG_SIZE;
>  	struct bpf_reg_state *reg;
>  	u8 *stype, type;
> +	int insn_flags = INSN_F_STACK_ACCESS | (spi << INSN_F_SPI_SHIFT) | reg_state->frameno;
>  
>  	stype = reg_state->stack[spi].slot_type;
>  	reg = &reg_state->stack[spi].spilled_ptr;
> @@ -4953,12 +4955,10 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
>  					return -EACCES;
>  				}
>  				mark_reg_unknown(env, state->regs, dst_regno);
> +				insn_flags = 0; /* not restoring original register state */
>  			}
>  			state->regs[dst_regno].live |= REG_LIVE_WRITTEN;
> -			return 0;
> -		}
> -
> -		if (dst_regno >= 0) {
> +		} else if (dst_regno >= 0) {
>  			/* restore register state from stack */
>  			copy_register_state(&state->regs[dst_regno], reg);
>  			/* mark reg as written since spilled pointer state likely
> @@ -4994,7 +4994,10 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
>  		mark_reg_read(env, reg, reg->parent, REG_LIVE_READ64);
>  		if (dst_regno >= 0)
>  			mark_reg_stack_read(env, reg_state, off, off + size, dst_regno);
> +		insn_flags = 0; /* we are not restoring spilled register */
>  	}
> +	if (insn_flags)
> +		return push_insn_history(env, env->cur_state, insn_flags);
>  	return 0;
>  }
>  
> @@ -7125,7 +7128,6 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
>  			       BPF_SIZE(insn->code), BPF_WRITE, -1, true, false);
>  	if (err)
>  		return err;
> -
>  	return 0;
>  }
>  
> @@ -17001,7 +17003,8 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
>  			 * the precision needs to be propagated back in
>  			 * the current state.
>  			 */
> -			err = err ? : push_jmp_history(env, cur);
> +			if (is_jmp_point(env, env->insn_idx))
> +				err = err ? : push_insn_history(env, cur, 0);
>  			err = err ? : propagate_precision(env, &sl->state);
>  			if (err)
>  				return err;
> @@ -17265,7 +17268,7 @@ static int do_check(struct bpf_verifier_env *env)
>  		}
>  
>  		if (is_jmp_point(env, env->insn_idx)) {
> -			err = push_jmp_history(env, state);
> +			err = push_insn_history(env, state, 0);
>  			if (err)
>  				return err;
>  		}
> diff --git a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
> index db6b3143338b..88c4207c6b4c 100644
> --- a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
> +++ b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
> @@ -487,7 +487,24 @@ __success __log_level(2)
>   * so we won't be able to mark stack slot fp-8 as precise, and so will
>   * fallback to forcing all as precise
>   */
> -__msg("mark_precise: frame0: falling back to forcing all scalars precise")
> +__msg("10: (0f) r1 += r7")
> +__msg("mark_precise: frame0: last_idx 10 first_idx 7 subseq_idx -1")
> +__msg("mark_precise: frame0: regs=r7 stack= before 9: (bf) r1 = r8")
> +__msg("mark_precise: frame0: regs=r7 stack= before 8: (27) r7 *= 4")
> +__msg("mark_precise: frame0: regs=r7 stack= before 7: (79) r7 = *(u64 *)(r10 -8)")
> +__msg("mark_precise: frame0: parent state regs= stack=-8:  R0_w=2 R6_w=1 R8_rw=map_value(off=0,ks=4,vs=16,imm=0) R10=fp0 fp-8_rw=P1")
> +__msg("mark_precise: frame0: last_idx 18 first_idx 0 subseq_idx 7")
> +__msg("mark_precise: frame0: regs= stack=-8 before 18: (95) exit")
> +__msg("mark_precise: frame1: regs= stack= before 17: (0f) r0 += r2")
> +__msg("mark_precise: frame1: regs= stack= before 16: (79) r2 = *(u64 *)(r1 +0)")
> +__msg("mark_precise: frame1: regs= stack= before 15: (79) r0 = *(u64 *)(r10 -16)")
> +__msg("mark_precise: frame1: regs= stack= before 14: (7b) *(u64 *)(r10 -16) = r2")
> +__msg("mark_precise: frame1: regs= stack= before 13: (7b) *(u64 *)(r1 +0) = r2")
> +__msg("mark_precise: frame1: regs=r2 stack= before 6: (85) call pc+6")
> +__msg("mark_precise: frame0: regs=r2 stack= before 5: (bf) r2 = r6")
> +__msg("mark_precise: frame0: regs=r6 stack= before 4: (07) r1 += -8")
> +__msg("mark_precise: frame0: regs=r6 stack= before 3: (bf) r1 = r10")
> +__msg("mark_precise: frame0: regs=r6 stack= before 2: (b7) r6 = 1")
>  __naked int subprog_spill_into_parent_stack_slot_precise(void)
>  {
>  	asm volatile (
> @@ -522,14 +539,68 @@ __naked int subprog_spill_into_parent_stack_slot_precise(void)
>  	);
>  }
>  
> -__naked __noinline __used
> -static __u64 subprog_with_checkpoint(void)
> +SEC("?raw_tp")
> +__success __log_level(2)
> +__msg("17: (0f) r1 += r0")
> +__msg("mark_precise: frame0: last_idx 17 first_idx 0 subseq_idx -1")
> +__msg("mark_precise: frame0: regs=r0 stack= before 16: (bf) r1 = r7")
> +__msg("mark_precise: frame0: regs=r0 stack= before 15: (27) r0 *= 4")
> +__msg("mark_precise: frame0: regs=r0 stack= before 14: (79) r0 = *(u64 *)(r10 -16)")
> +__msg("mark_precise: frame0: regs= stack=-16 before 13: (7b) *(u64 *)(r7 -8) = r0")
> +__msg("mark_precise: frame0: regs=r0 stack= before 12: (79) r0 = *(u64 *)(r8 +16)")
> +__msg("mark_precise: frame0: regs= stack=-16 before 11: (7b) *(u64 *)(r8 +16) = r0")
> +__msg("mark_precise: frame0: regs=r0 stack= before 10: (79) r0 = *(u64 *)(r7 -8)")
> +__msg("mark_precise: frame0: regs= stack=-16 before 9: (7b) *(u64 *)(r10 -16) = r0")
> +__msg("mark_precise: frame0: regs=r0 stack= before 8: (07) r8 += -32")
> +__msg("mark_precise: frame0: regs=r0 stack= before 7: (bf) r8 = r10")
> +__msg("mark_precise: frame0: regs=r0 stack= before 6: (07) r7 += -8")
> +__msg("mark_precise: frame0: regs=r0 stack= before 5: (bf) r7 = r10")
> +__msg("mark_precise: frame0: regs=r0 stack= before 21: (95) exit")
> +__msg("mark_precise: frame1: regs=r0 stack= before 20: (bf) r0 = r1")
> +__msg("mark_precise: frame1: regs=r1 stack= before 4: (85) call pc+15")
> +__msg("mark_precise: frame0: regs=r1 stack= before 3: (bf) r1 = r6")
> +__msg("mark_precise: frame0: regs=r6 stack= before 2: (b7) r6 = 1")
> +__naked int stack_slot_aliases_precision(void)
>  {
>  	asm volatile (
> -		"r0 = 0;"
> -		/* guaranteed checkpoint if BPF_F_TEST_STATE_FREQ is used */
> -		"goto +0;"
> +		"r6 = 1;"
> +		/* pass r6 through r1 into subprog to get it back as r0;
> +		 * this whole chain will have to be marked as precise later
> +		 */
> +		"r1 = r6;"
> +		"call identity_subprog;"
> +		/* let's setup two registers that are aliased to r10 */
> +		"r7 = r10;"
> +		"r7 += -8;"			/* r7 = r10 - 8 */
> +		"r8 = r10;"
> +		"r8 += -32;"			/* r8 = r10 - 32 */
> +		/* now spill subprog's return value (a r6 -> r1 -> r0 chain)
> +		 * a few times through different stack pointer regs, making
> +		 * sure to use r10, r7, and r8 both in LDX and STX insns, and
> +		 * *importantly* also using a combination of const var_off and
> +		 * insn->off to validate that we record final stack slot
> +		 * correctly, instead of relying on just insn->off derivation,
> +		 * which is only valid for r10-based stack offset
> +		 */
> +		"*(u64 *)(r10 - 16) = r0;"
> +		"r0 = *(u64 *)(r7 - 8);"	/* r7 - 8 == r10 - 16 */
> +		"*(u64 *)(r8 + 16) = r0;"	/* r8 + 16 = r10 - 16 */
> +		"r0 = *(u64 *)(r8 + 16);"
> +		"*(u64 *)(r7 - 8) = r0;"
> +		"r0 = *(u64 *)(r10 - 16);"
> +		/* get ready to use r0 as an index into array to force precision */
> +		"r0 *= 4;"
> +		"r1 = %[vals];"
> +		/* here r0->r1->r6 chain is forced to be precise and has to be
> +		 * propagated back to the beginning, including through the
> +		 * subprog call and all the stack spills and loads
> +		 */
> +		"r1 += r0;"
> +		"r0 = *(u32 *)(r1 + 0);"
>  		"exit;"
> +		:
> +		: __imm_ptr(vals)
> +		: __clobber_common, "r6"
>  	);
>  }
>  
> diff --git a/tools/testing/selftests/bpf/verifier/precise.c b/tools/testing/selftests/bpf/verifier/precise.c
> index 0d84dd1f38b6..8a2ff81d8350 100644
> --- a/tools/testing/selftests/bpf/verifier/precise.c
> +++ b/tools/testing/selftests/bpf/verifier/precise.c
> @@ -140,10 +140,11 @@
>  	.result = REJECT,
>  },
>  {
> -	"precise: ST insn causing spi > allocated_stack",
> +	"precise: ST zero to stack insn is supported",
>  	.insns = {
>  	BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
>  	BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 123, 0),
> +	/* not a register spill, so we stop precision propagation for R4 here */
>  	BPF_ST_MEM(BPF_DW, BPF_REG_3, -8, 0),
>  	BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8),
>  	BPF_MOV64_IMM(BPF_REG_0, -1),
> @@ -157,11 +158,11 @@
>  	mark_precise: frame0: last_idx 4 first_idx 2\
>  	mark_precise: frame0: regs=r4 stack= before 4\
>  	mark_precise: frame0: regs=r4 stack= before 3\
> -	mark_precise: frame0: regs= stack=-8 before 2\
> -	mark_precise: frame0: falling back to forcing all scalars precise\
> -	force_precise: frame0: forcing r0 to be precise\
>  	mark_precise: frame0: last_idx 5 first_idx 5\
> -	mark_precise: frame0: parent state regs= stack=:",
> +	mark_precise: frame0: parent state regs=r0 stack=:\
> +	mark_precise: frame0: last_idx 4 first_idx 2\
> +	mark_precise: frame0: regs=r0 stack= before 4\
> +	5: R0=-1 R4=0",
>  	.result = VERBOSE_ACCEPT,
>  	.retval = -1,
>  },
> @@ -169,6 +170,8 @@
>  	"precise: STX insn causing spi > allocated_stack",
>  	.insns = {
>  	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
> +	/* make later reg spill more interesting by having somewhat known scalar */
> +	BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 0xff),
>  	BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
>  	BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 123, 0),
>  	BPF_STX_MEM(BPF_DW, BPF_REG_3, BPF_REG_0, -8),
> @@ -179,18 +182,21 @@
>  	},
>  	.prog_type = BPF_PROG_TYPE_XDP,
>  	.flags = BPF_F_TEST_STATE_FREQ,
> -	.errstr = "mark_precise: frame0: last_idx 6 first_idx 6\
> +	.errstr = "mark_precise: frame0: last_idx 7 first_idx 7\
>  	mark_precise: frame0: parent state regs=r4 stack=:\
> -	mark_precise: frame0: last_idx 5 first_idx 3\
> -	mark_precise: frame0: regs=r4 stack= before 5\
> -	mark_precise: frame0: regs=r4 stack= before 4\
> -	mark_precise: frame0: regs= stack=-8 before 3\
> -	mark_precise: frame0: falling back to forcing all scalars precise\
> -	force_precise: frame0: forcing r0 to be precise\
> -	force_precise: frame0: forcing r0 to be precise\
> -	force_precise: frame0: forcing r0 to be precise\
> -	force_precise: frame0: forcing r0 to be precise\
> -	mark_precise: frame0: last_idx 6 first_idx 6\
> +	mark_precise: frame0: last_idx 6 first_idx 4\
> +	mark_precise: frame0: regs=r4 stack= before 6: (b7) r0 = -1\
> +	mark_precise: frame0: regs=r4 stack= before 5: (79) r4 = *(u64 *)(r10 -8)\
> +	mark_precise: frame0: regs= stack=-8 before 4: (7b) *(u64 *)(r3 -8) = r0\
> +	mark_precise: frame0: parent state regs=r0 stack=:\
> +	mark_precise: frame0: last_idx 3 first_idx 3\
> +	mark_precise: frame0: regs=r0 stack= before 3: (55) if r3 != 0x7b goto pc+0\
> +	mark_precise: frame0: regs=r0 stack= before 2: (bf) r3 = r10\
> +	mark_precise: frame0: regs=r0 stack= before 1: (57) r0 &= 255\
> +	mark_precise: frame0: parent state regs=r0 stack=:\
> +	mark_precise: frame0: last_idx 0 first_idx 0\
> +	mark_precise: frame0: regs=r0 stack= before 0: (85) call bpf_get_prandom_u32#7\
> +	mark_precise: frame0: last_idx 7 first_idx 7\
>  	mark_precise: frame0: parent state regs= stack=:",
>  	.result = VERBOSE_ACCEPT,
>  	.retval = -1,




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
  2023-10-31  5:03 ` [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return Andrii Nakryiko
@ 2023-11-09 15:20   ` Eduard Zingerman
  2023-11-09 17:32     ` Andrii Nakryiko
  0 siblings, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 15:20 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team

On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > Given verifier checks actual value, r0 has to be precise, so we need to
> > propagate precision properly.
> > 
> > Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

I don't follow why this is necessary, could you please conjure
an example showing that current behavior is not safe?
This example could be used as a test case, as this change
seems to not be covered by test cases.

> > ---
> >  kernel/bpf/verifier.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index fbb779583d52..098ba0e1a6ff 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -9739,6 +9739,12 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
> >  			verbose(env, "R0 not a scalar value\n");
> >  			return -EACCES;
> >  		}
> > +
> > +		/* we are going to enforce precise value, mark r0 precise */
> > +		err = mark_chain_precision(env, BPF_REG_0);
> > +		if (err)
> > +			return err;
> > +
> >  		if (!tnum_in(range, r0->var_off)) {
> >  			verbose_invalid_scalar(env, r0, &range, "callback return", "R0");
> >  			return -EINVAL;


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 4/7] bpf: fix check for attempt to corrupt spilled pointer
  2023-10-31  5:03 ` [PATCH bpf-next 4/7] bpf: fix check for attempt to corrupt spilled pointer Andrii Nakryiko
@ 2023-11-09 15:20   ` Eduard Zingerman
  0 siblings, 0 replies; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 15:20 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team

On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > When register is spilled onto a stack as a 1/2/4-byte register, we set
> > slot_type[BPF_REG_SIZE - 1] (plus potentially few more below it,
> > depending on actual spill size). So to check if some stack slot has
> > spilled register we need to consult slot_type[7], not slot_type[0].
> > 
> > To avoid the need to remember and double-check this in the future, just
> > use is_spilled_reg() helper.
> > 
> > Fixes: 638f5b90d460 ("bpf: reduce verifier memory consumption")
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

> > ---
> >  kernel/bpf/verifier.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 098ba0e1a6ff..82992c32c1bd 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -4622,7 +4622,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> >  	 * so it's aligned access and [off, off + size) are within stack limits
> >  	 */
> >  	if (!env->allow_ptr_leaks &&
> > -	    state->stack[spi].slot_type[0] == STACK_SPILL &&
> > +	    is_spilled_reg(&state->stack[spi]) &&
> >  	    size != BPF_REG_SIZE) {
> >  		verbose(env, "attempt to corrupt spilled pointer on stack\n");
> >  		return -EACCES;


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills
  2023-10-31  5:03 ` [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills Andrii Nakryiko
@ 2023-11-09 15:20   ` Eduard Zingerman
  2023-11-09 17:37     ` Andrii Nakryiko
  0 siblings, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 15:20 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team

On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> Instead of always forcing STACK_ZERO slots to STACK_MISC, preserve it in
> situations where this is possible. E.g., when spilling register as
> 1/2/4-byte subslots on the stack, all the remaining bytes in the stack
> slot do not automatically become unknown. If we knew they contained
> zeroes, we can preserve those STACK_ZERO markers.
> 
> Add a helper mark_stack_slot_misc(), similar to scrub_spilled_slot(),
> but that doesn't overwrite either STACK_INVALID nor STACK_ZERO. Note
> that we need to take into account possibility of being in unprivileged
> mode, in which case STACK_INVALID is forced to STACK_MISC for correctness,
> as treating STACK_INVALID as equivalent STACK_MISC is only enabled in
> privileged mode.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Could you please add a test case?

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> @@ -1355,6 +1355,21 @@ static void scrub_spilled_slot(u8 *stype)
>  		*stype = STACK_MISC;
>  }
>  
> +/* Mark stack slot as STACK_MISC, unless it is already STACK_INVALID, in which
> + * case they are equivalent, or it's STACK_ZERO, in which case we preserve
> + * more precise STACK_ZERO.
> + * Note, in uprivileged mode leaving STACK_INVALID is wrong, so we take
> + * env->allow_ptr_leaks into account and force STACK_MISC, if necessary.
> + */
> +static void mark_stack_slot_misc(struct bpf_verifier_env *env, u8 *stype)

Nitpick: I find this name misleading, maybe something like "remove_spill_mark"?

[...]



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore
  2023-10-31  5:03 ` [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore Andrii Nakryiko
@ 2023-11-09 15:20   ` Eduard Zingerman
  2023-11-09 17:41     ` Andrii Nakryiko
  0 siblings, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 15:20 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team

On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> Similar to special handling of STACK_ZERO, when reading 1/2/4 bytes from
> stack from slot that has register spilled into it and that register has
> a constant value zero, preserve that zero and mark spilled register as
> precise for that. This makes spilled const zero register and STACK_ZERO
> cases equivalent in their behavior.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Could you please add a test case?

[...]

> ---
>  kernel/bpf/verifier.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 0eecc6b3109c..8cfe060e4938 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4958,22 +4958,39 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
>  				copy_register_state(&state->regs[dst_regno], reg);
>  				state->regs[dst_regno].subreg_def = subreg_def;
>  			} else {
[...]
> +
> +				if (spill_cnt == size &&
> +				    tnum_is_const(reg->var_off) && reg->var_off.value == 0) {
> +					__mark_reg_const_zero(&state->regs[dst_regno]);
> +					/* this IS register fill, so keep insn_flags */
> +				} else if (zero_cnt == size) {
> +					/* similarly to mark_reg_stack_read(), preserve zeroes */
> +					__mark_reg_const_zero(&state->regs[dst_regno]);
> +					insn_flags = 0; /* not restoring original register state */
> +				} else {
> +					mark_reg_unknown(env, state->regs, dst_regno);
> +					insn_flags = 0; /* not restoring original register state */
> +				}

Condition for this branch is (off % BPF_REG_SIZE != 0) || size != spill_size,
is it necessary to check for some unusual offsets, e.g. off % BPF_REG_SIZE == 7
or something like that?

[...]



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
  2023-10-31  5:03 ` [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Andrii Nakryiko
  2023-10-31  5:22   ` Andrii Nakryiko
@ 2023-11-09 15:21   ` Eduard Zingerman
  2023-11-09 17:43     ` Andrii Nakryiko
  1 sibling, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 15:21 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team

On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> track aligned STACK_ZERO cases as imprecise spilled registers

Great improvement.
Could you please add a test case?

Acked-by: Eduard Zingerman <eddyz87@gmail.com>



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
  2023-11-09 15:20   ` Eduard Zingerman
@ 2023-11-09 16:13     ` Alexei Starovoitov
  2023-11-09 17:28       ` Andrii Nakryiko
  0 siblings, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2023-11-09 16:13 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team

On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > Instead of allocating and copying jump history each time we enqueue
> > child verifier state, switch to a model where we use one common
> > dynamically sized array of instruction jumps across all states.
> >
> > The key observation for proving this is correct is that jmp_history is
> > only relevant while state is active, which means it either is a current
> > state (and thus we are actively modifying jump history and no other
> > state can interfere with us) or we are checkpointed state with some
> > children still active (either enqueued or being current).
> >
> > In the latter case our portion of jump history is finalized and won't
> > change or grow, so as long as we keep it immutable until the state is
> > finalized, we are good.
> >
> > Now, when state is finalized and is put into state hash for potentially
> > future pruning lookups, jump history is not used anymore. This is
> > because jump history is only used by precision marking logic, and we
> > never modify precision markings for finalized states.
> >
> > So, instead of each state having its own small jump history, we keep
> > a global dynamically-sized jump history, where each state in current DFS
> > path from root to active state remembers its portion of jump history.
> > Current state can append to this history, but cannot modify any of its
> > parent histories.
> >
> > Because the jmp_history array can be grown through realloc, states don't
> > keep pointers, they instead maintain two indexes [start, end) into
> > global jump history array. End is exclusive index, so start == end means
> > there is no relevant jump history.
> >
> > This should eliminate a lot of allocations and minimize overall memory
> > usage (but I haven't benchmarked on real hardware, and QEMU benchmarking
> > is too noisy).
> >
> > Also, in the next patch we'll extend jump history to maintain additional
> > markings for some instructions even if there was no jump, so in
> > preparation for that call this thing a more generic "instruction history".
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Nitpick: could you please add a comment somewhere in the code
> (is_state_visited? pop_stack?) saying something like this:
>
>   states in the env->head happen to be sorted by insn_hist_end in
>   descending order, so popping next state for verification poses no
>   risk of overwriting history relevant for states remaining in
>   env->head.
>
> Side note: this change would make it harder to change states traversal
> order to something other than DFS, should we chose to do so.

I have the same concern.

When we discussed different algorithms to solve open-coded-iters/bpf_loop
issue non-DFS ideas came up multiple times.
To be fair I didn't like them, because I wanted to preserve DFS property :)
but I feel sooner or later we will be forced to explore non-DFS.
So I think this patch is no go. There is really no need to rely on DFS here.
Let instruction history consume more memory. It's a better long term trade off.
We don't do strict DFS today.
The speculative execution analysis is DFS, but it visits paths
multiple times, so it's not a canonical DFS.
It probably doesn't break this particular insn_hist approach,
but still feels too fragile to rely on DFS assumption long term.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking
  2023-11-09 15:20   ` Eduard Zingerman
@ 2023-11-09 17:20     ` Andrii Nakryiko
  2023-11-09 18:20       ` Eduard Zingerman
  0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:20 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team,
	Tao Lyu

On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
>
> All makes sense, a few nitpicks below.
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
> > +/* instruction history flags, used in bpf_insn_hist_entry.flags field */
> > +enum {
> > +     /* instruction references stack slot through PTR_TO_STACK register;
> > +      * we also store stack's frame number in lower 3 bits (MAX_CALL_FRAMES is 8)
> > +      * and accessed stack slot's index in next 6 bits (MAX_BPF_STACK is 512,
> > +      * 8 bytes per slot, so slot index (spi) is [0, 63])
> > +      */
> > +     INSN_F_FRAMENO_MASK = 0x7, /* 3 bits */
> > +
> > +     INSN_F_SPI_MASK = 0x3f, /* 6 bits */
> > +     INSN_F_SPI_SHIFT = 3, /* shifted 3 bits to the left */
> > +
> > +     INSN_F_STACK_ACCESS = BIT(9), /* we need 10 bits total */
> > +};
> > +
> > +static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES);
> > +static_assert(INSN_F_SPI_MASK + 1 >= MAX_BPF_STACK / 8);
> > +
> >  struct bpf_insn_hist_entry {
> > -     u32 prev_idx;
> >       u32 idx;
> > +     /* insn idx can't be bigger than 1 million */
> > +     u32 prev_idx : 22;
> > +     /* special flags, e.g., whether insn is doing register stack spill/load */
> > +     u32 flags : 10;
> >  };
>
> Nitpick: maybe use separate bit-fields for frameno and spi instead of
>          flags? Or add dedicated accessor functions?

I wanted to keep it very uniform so that push_insn_history() doesn't
know about all such details. It just has "flags". We might use these
flags for some other use cases, though if we run out of bits we'll
probably just expand bpf_insn_hist_entry and refactor existing code
anyways. So, basically, I didn't want to over-engineer this bit too
much :)

>
> >
> > -#define MAX_CALL_FRAMES 8
> >  /* Maximum number of register states that can exist at once */
> >  #define BPF_ID_MAP_SIZE ((MAX_BPF_REG + MAX_BPF_STACK / BPF_REG_SIZE) * MAX_CALL_FRAMES)
> >  struct bpf_verifier_state {
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 2905ce2e8b34..fbb779583d52 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -3479,14 +3479,20 @@ static bool is_jmp_point(struct bpf_verifier_env *env, int insn_idx)
> >  }
> >
> >  /* for any branch, call, exit record the history of jmps in the given state */
> > -static int push_jmp_history(struct bpf_verifier_env *env,
> > -                         struct bpf_verifier_state *cur)
> > +static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
> > +                          int insn_flags)
> >  {
> >       struct bpf_insn_hist_entry *p;
> >       size_t alloc_size;
> >
> > -     if (!is_jmp_point(env, env->insn_idx))
> > +     /* combine instruction flags if we already recorded this instruction */
> > +     if (cur->insn_hist_end > cur->insn_hist_start &&
> > +         (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
> > +         p->idx == env->insn_idx &&
> > +         p->prev_idx == env->prev_insn_idx) {
> > +             p->flags |= insn_flags;
>
> Nitpick: maybe add an assert to check that frameno/spi are not or'ed?

ok, something like

WARN_ON_ONCE(p->flags & (INSN_F_STACK_ACCESS | INSN_F_FRAMENOMASK |
(INSN_F_SPI_MASK << INSN_F_SPI_SHIFT)));

?

>
> [...]
>
> > +static struct bpf_insn_hist_entry *get_hist_insn_entry(struct bpf_verifier_env *env,
> > +                                                    u32 hist_start, u32 hist_end, int insn_idx)
>
> Nitpick: maybe rename 'hist_insn' to 'insn_hist', i.e. 'get_insn_hist_entry'?

sure, good point, done

>
> [...]
>
> > @@ -4713,9 +4711,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> >
> >               /* Mark slots affected by this stack write. */
> >               for (i = 0; i < size; i++)
> > -                     state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] =
> > -                             type;
> > +                     state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = type;
> > +             insn_flags = 0; /* not a register spill */
> >       }
> > +
> > +     if (insn_flags)
> > +             return push_insn_history(env, env->cur_state, insn_flags);
>
> Maybe add a check that insn is BPF_ST or BPF_STX here?
> Only these cases are supported by backtrack_insn() while
> check_mem_access() is called from multiple places.

seems like a wrong place to enforce that check_stack_write_fixed_off()
is called only for those instructions?

>
> >       return 0;
> >  }
> >
> > @@ -4908,6 +4909,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
> >       int i, slot = -off - 1, spi = slot / BPF_REG_SIZE;
> >       struct bpf_reg_state *reg;
> >       u8 *stype, type;
> > +     int insn_flags = INSN_F_STACK_ACCESS | (spi << INSN_F_SPI_SHIFT) | reg_state->frameno;
> >
> >       stype = reg_state->stack[spi].slot_type;
> >       reg = &reg_state->stack[spi].spilled_ptr;

[...]

trimming is good

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
  2023-11-09 16:13     ` Alexei Starovoitov
@ 2023-11-09 17:28       ` Andrii Nakryiko
  2023-11-09 19:29         ` Alexei Starovoitov
  0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kernel Team

On Thu, Nov 9, 2023 at 8:14 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > > Instead of allocating and copying jump history each time we enqueue
> > > child verifier state, switch to a model where we use one common
> > > dynamically sized array of instruction jumps across all states.
> > >
> > > The key observation for proving this is correct is that jmp_history is
> > > only relevant while state is active, which means it either is a current
> > > state (and thus we are actively modifying jump history and no other
> > > state can interfere with us) or we are checkpointed state with some
> > > children still active (either enqueued or being current).
> > >
> > > In the latter case our portion of jump history is finalized and won't
> > > change or grow, so as long as we keep it immutable until the state is
> > > finalized, we are good.
> > >
> > > Now, when state is finalized and is put into state hash for potentially
> > > future pruning lookups, jump history is not used anymore. This is
> > > because jump history is only used by precision marking logic, and we
> > > never modify precision markings for finalized states.
> > >
> > > So, instead of each state having its own small jump history, we keep
> > > a global dynamically-sized jump history, where each state in current DFS
> > > path from root to active state remembers its portion of jump history.
> > > Current state can append to this history, but cannot modify any of its
> > > parent histories.
> > >
> > > Because the jmp_history array can be grown through realloc, states don't
> > > keep pointers, they instead maintain two indexes [start, end) into
> > > global jump history array. End is exclusive index, so start == end means
> > > there is no relevant jump history.
> > >
> > > This should eliminate a lot of allocations and minimize overall memory
> > > usage (but I haven't benchmarked on real hardware, and QEMU benchmarking
> > > is too noisy).
> > >
> > > Also, in the next patch we'll extend jump history to maintain additional
> > > markings for some instructions even if there was no jump, so in
> > > preparation for that call this thing a more generic "instruction history".
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> >
> > Nitpick: could you please add a comment somewhere in the code
> > (is_state_visited? pop_stack?) saying something like this:
> >
> >   states in the env->head happen to be sorted by insn_hist_end in
> >   descending order, so popping next state for verification poses no
> >   risk of overwriting history relevant for states remaining in
> >   env->head.
> >
> > Side note: this change would make it harder to change states traversal
> > order to something other than DFS, should we chose to do so.
>
> I have the same concern.
>
> When we discussed different algorithms to solve open-coded-iters/bpf_loop
> issue non-DFS ideas came up multiple times.
> To be fair I didn't like them, because I wanted to preserve DFS property :)
> but I feel sooner or later we will be forced to explore non-DFS.
> So I think this patch is no go. There is really no need to rely on DFS here.

If we ever break DFS property, we can easily change this. Or we can
even have a hybrid: as long as traversal preserves DFS property, we
use global shared history, but we can also optionally clone and have
our own history if necessary. It's a matter of adding optional
potentially NULL pointer to "local history". All this is very nicely
hidden away from "normal" code.

> Let instruction history consume more memory. It's a better long term trade off.

Before we decide this, let me collect stats on how much memory we use
for jmp_history with and without my change. I'll need to add a bit of
temporary code to veristat and verifier to collect this, but it
shouldn't take much effort. OK?

> We don't do strict DFS today.
> The speculative execution analysis is DFS, but it visits paths
> multiple times, so it's not a canonical DFS.

Not sure I follow. It's still a DFS, we just branch out more.

But again, let's look at data first. I'll get back with numbers soon.

> It probably doesn't break this particular insn_hist approach,
> but still feels too fragile to rely on DFS assumption long term.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
  2023-11-09 15:20   ` Eduard Zingerman
@ 2023-11-09 17:32     ` Andrii Nakryiko
  2023-11-09 17:38       ` Eduard Zingerman
  0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:32 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > > Given verifier checks actual value, r0 has to be precise, so we need to
> > > propagate precision properly.
> > >
> > > Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> I don't follow why this is necessary, could you please conjure
> an example showing that current behavior is not safe?
> This example could be used as a test case, as this change
> seems to not be covered by test cases.

We rely on callbacks to return specific value (0 or 1, for example),
and use or might use that in kernel code. So if we rely on the
specific value of a register, it has to be precise. Marking r0 as
precise will have implications on other registers from which r0 was
derived. This might have implications on state pruning and stuff. If
r0 and its ancestors are not precise, we might erroneously assume some
states are safe and prune them, even though they are not.

I'll see if I can come up with a simple and quick test. I can always
drop this change, it was a bit of a drive-by bug I noticed while
looking for other issues.

>
> > > ---
> > >  kernel/bpf/verifier.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index fbb779583d52..098ba0e1a6ff 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -9739,6 +9739,12 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
> > >                     verbose(env, "R0 not a scalar value\n");
> > >                     return -EACCES;
> > >             }
> > > +
> > > +           /* we are going to enforce precise value, mark r0 precise */
> > > +           err = mark_chain_precision(env, BPF_REG_0);
> > > +           if (err)
> > > +                   return err;
> > > +
> > >             if (!tnum_in(range, r0->var_off)) {
> > >                     verbose_invalid_scalar(env, r0, &range, "callback return", "R0");
> > >                     return -EINVAL;
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills
  2023-11-09 15:20   ` Eduard Zingerman
@ 2023-11-09 17:37     ` Andrii Nakryiko
  2023-11-09 17:54       ` Eduard Zingerman
  0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:37 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > Instead of always forcing STACK_ZERO slots to STACK_MISC, preserve it in
> > situations where this is possible. E.g., when spilling register as
> > 1/2/4-byte subslots on the stack, all the remaining bytes in the stack
> > slot do not automatically become unknown. If we knew they contained
> > zeroes, we can preserve those STACK_ZERO markers.
> >
> > Add a helper mark_stack_slot_misc(), similar to scrub_spilled_slot(),
> > but that doesn't overwrite either STACK_INVALID nor STACK_ZERO. Note
> > that we need to take into account possibility of being in unprivileged
> > mode, in which case STACK_INVALID is forced to STACK_MISC for correctness,
> > as treating STACK_INVALID as equivalent STACK_MISC is only enabled in
> > privileged mode.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Could you please add a test case?
>

sure

> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
> > @@ -1355,6 +1355,21 @@ static void scrub_spilled_slot(u8 *stype)
> >               *stype = STACK_MISC;
> >  }
> >
> > +/* Mark stack slot as STACK_MISC, unless it is already STACK_INVALID, in which
> > + * case they are equivalent, or it's STACK_ZERO, in which case we preserve
> > + * more precise STACK_ZERO.
> > + * Note, in uprivileged mode leaving STACK_INVALID is wrong, so we take
> > + * env->allow_ptr_leaks into account and force STACK_MISC, if necessary.
> > + */
> > +static void mark_stack_slot_misc(struct bpf_verifier_env *env, u8 *stype)
>
> Nitpick: I find this name misleading, maybe something like "remove_spill_mark"?

remove_spill_mark is even more misleading, no? there is also DYNPTR
and ITER stack slots?

maybe mark_stack_slot_scalar (though that's a bit misleading as well,
as can be understood as marking slot as spilled SCALAR_VALUE
register)? not sure, I think "slot_misc" is close enough as an
approximation of what it's doing, modulo ZERO/INVALID

>
> [...]
>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
  2023-11-09 17:32     ` Andrii Nakryiko
@ 2023-11-09 17:38       ` Eduard Zingerman
  2023-11-09 17:50         ` Andrii Nakryiko
  0 siblings, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 17:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, 2023-11-09 at 09:32 -0800, Andrii Nakryiko wrote:
> On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > > > Given verifier checks actual value, r0 has to be precise, so we need to
> > > > propagate precision properly.
> > > > 
> > > > Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
> > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > 
> > I don't follow why this is necessary, could you please conjure
> > an example showing that current behavior is not safe?
> > This example could be used as a test case, as this change
> > seems to not be covered by test cases.
> 
> We rely on callbacks to return specific value (0 or 1, for example),
> and use or might use that in kernel code. So if we rely on the
> specific value of a register, it has to be precise. Marking r0 as
> precise will have implications on other registers from which r0 was
> derived. This might have implications on state pruning and stuff. If
> r0 and its ancestors are not precise, we might erroneously assume some
> states are safe and prune them, even though they are not.

The r0 returned from bpf_loop's callback says bpf_loop to stop iteration,
bpf_loop returns the number of completed iterations. However, the return
value of bpf_loop modeled by verifier is unbounded scalar.
Same for map's for each.

I'm not sure we have callback calling functions that can expose this as a
safety issue.

[...]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore
  2023-11-09 15:20   ` Eduard Zingerman
@ 2023-11-09 17:41     ` Andrii Nakryiko
  2023-11-09 19:34       ` Eduard Zingerman
  0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:41 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, Nov 9, 2023 at 7:21 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > Similar to special handling of STACK_ZERO, when reading 1/2/4 bytes from
> > stack from slot that has register spilled into it and that register has
> > a constant value zero, preserve that zero and mark spilled register as
> > precise for that. This makes spilled const zero register and STACK_ZERO
> > cases equivalent in their behavior.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Could you please add a test case?
>

There is already at least one test case that relies on this behavior
:) But yep, I'll add a dedicated test.

> [...]
>
> > ---
> >  kernel/bpf/verifier.c | 25 +++++++++++++++++++++----
> >  1 file changed, 21 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 0eecc6b3109c..8cfe060e4938 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -4958,22 +4958,39 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
> >                               copy_register_state(&state->regs[dst_regno], reg);
> >                               state->regs[dst_regno].subreg_def = subreg_def;
> >                       } else {
> [...]
> > +
> > +                             if (spill_cnt == size &&
> > +                                 tnum_is_const(reg->var_off) && reg->var_off.value == 0) {
> > +                                     __mark_reg_const_zero(&state->regs[dst_regno]);
> > +                                     /* this IS register fill, so keep insn_flags */
> > +                             } else if (zero_cnt == size) {
> > +                                     /* similarly to mark_reg_stack_read(), preserve zeroes */
> > +                                     __mark_reg_const_zero(&state->regs[dst_regno]);
> > +                                     insn_flags = 0; /* not restoring original register state */
> > +                             } else {
> > +                                     mark_reg_unknown(env, state->regs, dst_regno);
> > +                                     insn_flags = 0; /* not restoring original register state */
> > +                             }
>
> Condition for this branch is (off % BPF_REG_SIZE != 0) || size != spill_size,
> is it necessary to check for some unusual offsets, e.g. off % BPF_REG_SIZE == 7
> or something like that?

I don't think so. We rely on all bytes we are reading to be either
spills (and thus spill_cnt == size), in which case verifier logic
makes sure we have spill at slot boundary (off % BPF_REG_SIZE == 0).
Or it's all STACK_ZERO, and then zero_cnt == size, in which case we
know it's zero.

Unless I missed something else?

>
> [...]
>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
  2023-11-09 15:21   ` Eduard Zingerman
@ 2023-11-09 17:43     ` Andrii Nakryiko
  2023-11-09 17:44       ` Eduard Zingerman
  0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:43 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, Nov 9, 2023 at 7:21 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > track aligned STACK_ZERO cases as imprecise spilled registers
>
> Great improvement.

thanks!

> Could you please add a test case?

sure, though I guess I'd have to rely on verifier state printing logic
for this, is that ok?

>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
  2023-11-09 17:43     ` Andrii Nakryiko
@ 2023-11-09 17:44       ` Eduard Zingerman
  0 siblings, 0 replies; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 17:44 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, 2023-11-09 at 09:43 -0800, Andrii Nakryiko wrote:
> sure, though I guess I'd have to rely on verifier state printing logic
> for this, is that ok?

Sure, thank you.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
  2023-11-09 17:38       ` Eduard Zingerman
@ 2023-11-09 17:50         ` Andrii Nakryiko
  2023-11-09 17:58           ` Alexei Starovoitov
  2023-11-09 18:00           ` Eduard Zingerman
  0 siblings, 2 replies; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:50 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, Nov 9, 2023 at 9:38 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2023-11-09 at 09:32 -0800, Andrii Nakryiko wrote:
> > On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > > > > Given verifier checks actual value, r0 has to be precise, so we need to
> > > > > propagate precision properly.
> > > > >
> > > > > Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
> > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > >
> > > I don't follow why this is necessary, could you please conjure
> > > an example showing that current behavior is not safe?
> > > This example could be used as a test case, as this change
> > > seems to not be covered by test cases.
> >
> > We rely on callbacks to return specific value (0 or 1, for example),
> > and use or might use that in kernel code. So if we rely on the
> > specific value of a register, it has to be precise. Marking r0 as
> > precise will have implications on other registers from which r0 was
> > derived. This might have implications on state pruning and stuff. If
> > r0 and its ancestors are not precise, we might erroneously assume some
> > states are safe and prune them, even though they are not.
>
> The r0 returned from bpf_loop's callback says bpf_loop to stop iteration,
> bpf_loop returns the number of completed iterations. However, the return
> value of bpf_loop modeled by verifier is unbounded scalar.
> Same for map's for each.

return value of bpf_loop() is a different thing from return value of
bpf_loop's callback. Right now bpf_loop implementation in kernel does

ret = callback(...);
/* return value: 0 - continue, 1 - stop and return */
if (ret)
   return i + 1;

So yes, it doesn't rely explicitly on return value to be 1 just due to
the above implementation. But verifier is meant to enforce that and
the protocol is that bpf_loop and other callback calling helpers
should rely on this value.

I think we have the same problem in check_return_code() for entry BPF
programs. So let me taking this one out of this patch set and post a
new one concentrating on this particular issue. I've been meaning to
use umin/umax for return value checking anyways, so might be a good
idea to do this anyways.

>
> I'm not sure we have callback calling functions that can expose this as a
> safety issue.

Even if we can't exploit it today, it's breaking the protocol and
guarantees that verifier provides, so I think this needs to be fixed.

>
> [...]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills
  2023-11-09 17:37     ` Andrii Nakryiko
@ 2023-11-09 17:54       ` Eduard Zingerman
  0 siblings, 0 replies; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 17:54 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, 2023-11-09 at 09:37 -0800, Andrii Nakryiko wrote:
[...]
> > > @@ -1355,6 +1355,21 @@ static void scrub_spilled_slot(u8 *stype)
> > >               *stype = STACK_MISC;
> > >  }
> > > 
> > > +/* Mark stack slot as STACK_MISC, unless it is already STACK_INVALID, in which
> > > + * case they are equivalent, or it's STACK_ZERO, in which case we preserve
> > > + * more precise STACK_ZERO.
> > > + * Note, in uprivileged mode leaving STACK_INVALID is wrong, so we take
> > > + * env->allow_ptr_leaks into account and force STACK_MISC, if necessary.
> > > + */
> > > +static void mark_stack_slot_misc(struct bpf_verifier_env *env, u8 *stype)
> > 
> > Nitpick: I find this name misleading, maybe something like "remove_spill_mark"?
> 
> remove_spill_mark is even more misleading, no? there is also DYNPTR
> and ITER stack slots?

Right, forgot about those...

> 
> maybe mark_stack_slot_scalar (though that's a bit misleading as well,
> as can be understood as marking slot as spilled SCALAR_VALUE
> register)? not sure, I think "slot_misc" is close enough as an
> approximation of what it's doing, modulo ZERO/INVALID

maybe_mark_stack_slot_misc?
The other similar function is named 'scrub_spilled_slot'. 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
  2023-11-09 17:50         ` Andrii Nakryiko
@ 2023-11-09 17:58           ` Alexei Starovoitov
  2023-11-09 18:01             ` Andrii Nakryiko
  2023-11-09 18:00           ` Eduard Zingerman
  1 sibling, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2023-11-09 17:58 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kernel Team

On Thu, Nov 9, 2023 at 9:50 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> >
> > The r0 returned from bpf_loop's callback says bpf_loop to stop iteration,
> > bpf_loop returns the number of completed iterations. However, the return
> > value of bpf_loop modeled by verifier is unbounded scalar.
> > Same for map's for each.
>
> return value of bpf_loop() is a different thing from return value of
> bpf_loop's callback. Right now bpf_loop implementation in kernel does
>
> ret = callback(...);
> /* return value: 0 - continue, 1 - stop and return */
> if (ret)
>    return i + 1;
>
> So yes, it doesn't rely explicitly on return value to be 1 just due to
> the above implementation. But verifier is meant to enforce that and
> the protocol is that bpf_loop and other callback calling helpers
> should rely on this value.
>
> I think we have the same problem in check_return_code() for entry BPF
> programs. So let me taking this one out of this patch set and post a
> new one concentrating on this particular issue. I've been meaning to
> use umin/umax for return value checking anyways, so might be a good
> idea to do this anyways.

Just like Ed I was also initially confused by this.
As you said check_return_code() has the same problem.
I think the issue this patch and similar in check_return_code()
should be fixing is the case where one state went through
ret code checking, but another state with potentially out-of-range
r0 got state pruned since r0 wasn't marked precise.
Not sure how hard it would be to come up with a selftest for such a scenario.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
  2023-11-09 17:50         ` Andrii Nakryiko
  2023-11-09 17:58           ` Alexei Starovoitov
@ 2023-11-09 18:00           ` Eduard Zingerman
  1 sibling, 0 replies; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 18:00 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, 2023-11-09 at 09:50 -0800, Andrii Nakryiko wrote:
> On Thu, Nov 9, 2023 at 9:38 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Thu, 2023-11-09 at 09:32 -0800, Andrii Nakryiko wrote:
> > > On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > > 
> > > > On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > > > > > Given verifier checks actual value, r0 has to be precise, so we need to
> > > > > > propagate precision properly.
> > > > > > 
> > > > > > Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
> > > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > 
> > > > I don't follow why this is necessary, could you please conjure
> > > > an example showing that current behavior is not safe?
> > > > This example could be used as a test case, as this change
> > > > seems to not be covered by test cases.
> > > 
> > > We rely on callbacks to return specific value (0 or 1, for example),
> > > and use or might use that in kernel code. So if we rely on the
> > > specific value of a register, it has to be precise. Marking r0 as
> > > precise will have implications on other registers from which r0 was
> > > derived. This might have implications on state pruning and stuff. If
> > > r0 and its ancestors are not precise, we might erroneously assume some
> > > states are safe and prune them, even though they are not.
> > 
> > The r0 returned from bpf_loop's callback says bpf_loop to stop iteration,
> > bpf_loop returns the number of completed iterations. However, the return
> > value of bpf_loop modeled by verifier is unbounded scalar.
> > Same for map's for each.
> 
> return value of bpf_loop() is a different thing from return value of
> bpf_loop's callback. Right now bpf_loop implementation in kernel does
> 
> ret = callback(...);
> /* return value: 0 - continue, 1 - stop and return */
> if (ret)
>    return i + 1;
> 
> So yes, it doesn't rely explicitly on return value to be 1 just due to
> the above implementation. But verifier is meant to enforce that and
> the protocol is that bpf_loop and other callback calling helpers
> should rely on this value.
> 
> I think we have the same problem in check_return_code() for entry BPF
> programs. So let me taking this one out of this patch set and post a
> new one concentrating on this particular issue. I've been meaning to
> use umin/umax for return value checking anyways, so might be a good
> idea to do this anyways.

The precision mark is necessary if verifier makes some decisions
basing on the value. E.g. whether certain code path would be take or
whether specific value would be used as a pointer offset.
Neither is true for existing callbacks, value returned by callback
does not affect any verifier decisions.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
  2023-11-09 17:58           ` Alexei Starovoitov
@ 2023-11-09 18:01             ` Andrii Nakryiko
  2023-11-09 18:03               ` Eduard Zingerman
  0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 18:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kernel Team

On Thu, Nov 9, 2023 at 9:58 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 9:50 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > >
> > > The r0 returned from bpf_loop's callback says bpf_loop to stop iteration,
> > > bpf_loop returns the number of completed iterations. However, the return
> > > value of bpf_loop modeled by verifier is unbounded scalar.
> > > Same for map's for each.
> >
> > return value of bpf_loop() is a different thing from return value of
> > bpf_loop's callback. Right now bpf_loop implementation in kernel does
> >
> > ret = callback(...);
> > /* return value: 0 - continue, 1 - stop and return */
> > if (ret)
> >    return i + 1;
> >
> > So yes, it doesn't rely explicitly on return value to be 1 just due to
> > the above implementation. But verifier is meant to enforce that and
> > the protocol is that bpf_loop and other callback calling helpers
> > should rely on this value.
> >
> > I think we have the same problem in check_return_code() for entry BPF
> > programs. So let me taking this one out of this patch set and post a
> > new one concentrating on this particular issue. I've been meaning to
> > use umin/umax for return value checking anyways, so might be a good
> > idea to do this anyways.
>
> Just like Ed I was also initially confused by this.
> As you said check_return_code() has the same problem.
> I think the issue this patch and similar in check_return_code()
> should be fixing is the case where one state went through
> ret code checking, but another state with potentially out-of-range
> r0 got state pruned since r0 wasn't marked precise.

Right.

> Not sure how hard it would be to come up with a selftest for such a scenario.

Yep, I'll think of something. Lots of tests to come up with :)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
  2023-11-09 18:01             ` Andrii Nakryiko
@ 2023-11-09 18:03               ` Eduard Zingerman
  0 siblings, 0 replies; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 18:03 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team

On Thu, 2023-11-09 at 10:01 -0800, Andrii Nakryiko wrote:
[...]
> > Just like Ed I was also initially confused by this.
> > As you said check_return_code() has the same problem.
> > I think the issue this patch and similar in check_return_code()
> > should be fixing is the case where one state went through
> > ret code checking, but another state with potentially out-of-range
> > r0 got state pruned since r0 wasn't marked precise.
> 
> Right.
> 
> > Not sure how hard it would be to come up with a selftest for such a scenario.
> 
> Yep, I'll think of something. Lots of tests to come up with :)

Hm, range argument is convincing, thank you.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking
  2023-11-09 17:20     ` Andrii Nakryiko
@ 2023-11-09 18:20       ` Eduard Zingerman
  2023-11-10  5:48         ` Andrii Nakryiko
  0 siblings, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 18:20 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team,
	Tao Lyu

On Thu, 2023-11-09 at 09:20 -0800, Andrii Nakryiko wrote:
[...]
> > >  struct bpf_insn_hist_entry {
> > > -     u32 prev_idx;
> > >       u32 idx;
> > > +     /* insn idx can't be bigger than 1 million */
> > > +     u32 prev_idx : 22;
> > > +     /* special flags, e.g., whether insn is doing register stack spill/load */
> > > +     u32 flags : 10;
> > >  };
> > 
> > Nitpick: maybe use separate bit-fields for frameno and spi instead of
> >          flags? Or add dedicated accessor functions?
> 
> I wanted to keep it very uniform so that push_insn_history() doesn't
> know about all such details. It just has "flags". We might use these
> flags for some other use cases, though if we run out of bits we'll
> probably just expand bpf_insn_hist_entry and refactor existing code
> anyways. So, basically, I didn't want to over-engineer this bit too
> much :)

Well, maybe hide "(hist->flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK"
behind an accessor?

[...]

> > > +static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
> > > +                          int insn_flags)
> > >  {
> > >       struct bpf_insn_hist_entry *p;
> > >       size_t alloc_size;
> > > 
> > > -     if (!is_jmp_point(env, env->insn_idx))
> > > +     /* combine instruction flags if we already recorded this instruction */
> > > +     if (cur->insn_hist_end > cur->insn_hist_start &&
> > > +         (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
> > > +         p->idx == env->insn_idx &&
> > > +         p->prev_idx == env->prev_insn_idx) {
> > > +             p->flags |= insn_flags;
> > 
> > Nitpick: maybe add an assert to check that frameno/spi are not or'ed?
> 
> ok, something like
> 
> WARN_ON_ONCE(p->flags & (INSN_F_STACK_ACCESS | INSN_F_FRAMENOMASK |
> (INSN_F_SPI_MASK << INSN_F_SPI_SHIFT)));
> 
> ?

Something like this, yes.

[...]

> > > @@ -4713,9 +4711,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> > > 
> > >               /* Mark slots affected by this stack write. */
> > >               for (i = 0; i < size; i++)
> > > -                     state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] =
> > > -                             type;
> > > +                     state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = type;
> > > +             insn_flags = 0; /* not a register spill */
> > >       }
> > > +
> > > +     if (insn_flags)
> > > +             return push_insn_history(env, env->cur_state, insn_flags);
> > 
> > Maybe add a check that insn is BPF_ST or BPF_STX here?
> > Only these cases are supported by backtrack_insn() while
> > check_mem_access() is called from multiple places.
> 
> seems like a wrong place to enforce that check_stack_write_fixed_off()
> is called only for those instructions?

check_stack_write_fixed_off() is called from check_stack_write() which
is called from check_mem_access() which might trigger
check_stack_write_fixed_off() when called with BPF_WRITE flag and
pointer to stack as an argument.
This happens for ST, STX but also in check_helper_call(),
process_iter_arg() (maybe other places).
Speaking of which, should this be handled in backtrack_insn()?

> [...]
> 
> trimming is good

Sigh... sorry, really tried to trim everything today.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
  2023-11-09 17:28       ` Andrii Nakryiko
@ 2023-11-09 19:29         ` Alexei Starovoitov
  2023-11-09 19:49           ` Andrii Nakryiko
  0 siblings, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2023-11-09 19:29 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kernel Team

On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
>
> If we ever break DFS property, we can easily change this. Or we can
> even have a hybrid: as long as traversal preserves DFS property, we
> use global shared history, but we can also optionally clone and have
> our own history if necessary. It's a matter of adding optional
> potentially NULL pointer to "local history". All this is very nicely
> hidden away from "normal" code.

If we can "easily change this" then let's make it last and optional patch.
So we can revert in the future when we need to take non-DFS path.

> But again, let's look at data first. I'll get back with numbers soon.

Sure. I think memory increase due to more tracking is ok.
I suspect it won't cause 2x increase. Likely few %.
The last time I checked the main memory hog is states stashed for pruning.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore
  2023-11-09 17:41     ` Andrii Nakryiko
@ 2023-11-09 19:34       ` Eduard Zingerman
  0 siblings, 0 replies; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 19:34 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Thu, 2023-11-09 at 09:41 -0800, Andrii Nakryiko wrote:
> On Thu, Nov 9, 2023 at 7:21 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > > Similar to special handling of STACK_ZERO, when reading 1/2/4 bytes from
> > > stack from slot that has register spilled into it and that register has
> > > a constant value zero, preserve that zero and mark spilled register as
> > > precise for that. This makes spilled const zero register and STACK_ZERO
> > > cases equivalent in their behavior.
> > > 
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > 
> > Could you please add a test case?
> > 
> 
> There is already at least one test case that relies on this behavior
> :) But yep, I'll add a dedicated test.

Thank you. Having a dedicated test always helps with debugging, should
something go wrong.

[...]

> > Condition for this branch is (off % BPF_REG_SIZE != 0) || size != spill_size,
> > is it necessary to check for some unusual offsets, e.g. off % BPF_REG_SIZE == 7
> > or something like that?
> 
> I don't think so. We rely on all bytes we are reading to be either
> spills (and thus spill_cnt == size), in which case verifier logic
> makes sure we have spill at slot boundary (off % BPF_REG_SIZE == 0).
> Or it's all STACK_ZERO, and then zero_cnt == size, in which case we
> know it's zero.
> 
> Unless I missed something else?

False alarm, 'slot' is derived from 'off' and the loop checks
'type = stype[(slot - i) % BPF_REG_SIZE];', sorry for the noise.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
  2023-11-09 19:29         ` Alexei Starovoitov
@ 2023-11-09 19:49           ` Andrii Nakryiko
  2023-11-09 20:39             ` Andrii Nakryiko
  0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 19:49 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kernel Team

On Thu, Nov 9, 2023 at 11:29 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> >
> > If we ever break DFS property, we can easily change this. Or we can
> > even have a hybrid: as long as traversal preserves DFS property, we
> > use global shared history, but we can also optionally clone and have
> > our own history if necessary. It's a matter of adding optional
> > potentially NULL pointer to "local history". All this is very nicely
> > hidden away from "normal" code.
>
> If we can "easily change this" then let's make it last and optional patch.
> So we can revert in the future when we need to take non-DFS path.

Ok, sounds good. I'll reorder and put it last, you can decide whether
to apply it or not that way.

>
> > But again, let's look at data first. I'll get back with numbers soon.
>
> Sure. I think memory increase due to more tracking is ok.
> I suspect it won't cause 2x increase. Likely few %.
> The last time I checked the main memory hog is states stashed for pruning.

So I'm back with data. See verifier.c changes I did at the bottom,
just to double check I'm not missing something major. I count the
number of allocations (but that's an underestimate that doesn't take
into account realloc), total number of instruction history entries for
entire program verification, and then also peak "depth" of instruction
history. Note that entries should be multiplied by 8 to get the amount
of bytes (and that's not counting per-allocation overhead).

Here are top 20 results, sorted by number of allocs for Meta-internal,
Cilium, and selftests. BEFORE is without added STACK_ACCESS tracking
and STACK_ZERO optimization. AFTER is with all the patches of this
patch set applied.

It's a few megabytes of memory allocation, which in itself is probably
not a big deal. But it's just an amount of unnecessary memory
allocations which is basically at least 2x of the total number of
states that we can save. And instead have just a few reallocs to size
global jump history to an order of magnitudes smaller peak entries.

And if we ever decide to track more stuff similar to
INSNS_F_STACK_ACCESS, we won't have to worry about more allocations or
more memory usage, because the absolute worst case is our global
history will be up to 1 million entries tops. We can track some *code
path dependent* per-instruction information for *each simulated
instruction* easily without having to think twice about this. Which I
think is a nice liberating thought in itself justifying this change.


META BEFORE
===========
[vmuser@archvm bpf]$ sudo veristat -e
prog,insns,states,jmp_allocs,jmp_total,jmp_peak -R
~/insn-hist-before-results-fbcode.csv -s jmp_allocs -n 20
Program                                    Insns  States  Jumphist
allocs  Jumphist total entries  Jumphist peak entries
----------------------------------------  ------  ------
---------------  ----------------------  ---------------------
syar_file_open                            712974   51407
154982                  546559                    742
balancer_ingress                          339626   26535
92061                  106593                     61
vip_filter                                457002   33010
83432                  201396                    276
tw_twfw_egress                            511127   16733
81485                  382989                   4379
tw_twfw_tc_eg                             511113   16732
81484                  382987                   4379
tw_twfw_ingress                           500095   16223
80974                  381708                   4379
tw_twfw_tc_in                             500095   16223
80974                  381708                   4379
adns                                      384816   11145
41882                  128399                     52
cls_fg_dscp                               217709   13908
28163                   59291                    117
edgewall                                  179715   12607
26886                   51134                     74
mount_audit                                87915    1938
19198                  104412                    315
xdpdecap                                   62648    5577
17530                   18315                     38
xdpdecap                                   62648    5577
17530                   18315                     38
xdpdecap                                   62648    5577
17530                   18315                     38
xdpdecap                                   58507    4687
16527                   18613                     40
syar_lsm_file_open                        167772    1836
12720                   90107                   2107
tcdecapstats                               38691    3112
10991                   12371                     34
twfw_connect6                              44399    1974
9864                   36320                   1797
twfw_sendmsg6                              44399    1974
9864                   36320                   1797
on_pytorch_event                          100370    2153
7102                   25661                    199

META AFTER
==========
[vmuser@archvm bpf]$ sudo veristat -e
prog,insns,states,jmp_allocs,jmp_total,jmp_peak -R
~/insn-hist-after-results-fbcode.csv -s jmp_allocs -n 20
Program                                    Insns  States  Jumphist
allocs  Jumphist total entries  Jumphist peak entries
----------------------------------------  ------  ------
---------------  ----------------------  ---------------------
syar_file_open                            707473   51263
154488                  431397                    464
balancer_ingress                          334452   26438
91881                  107822                    114
vip_filter                                457002   33010
83432                  287548                    374
adns                                      384816   11145
41882                   86357                    274
tw_twfw_egress                            212071    8504
33886                  161518                   5184
tw_twfw_ingress                           212069    8504
33886                  161515                   5184
tw_twfw_tc_eg                             212064    8504
33886                  161515                   5184
tw_twfw_tc_in                             212069    8504
33886                  161515                   5184
cls_fg_dscp                               213184   13702
27722                  118594                    281
mount_audit                                87915    1938
19198                  103835                    202
xdpdecap                                   62648    5577
17530                   25723                     96
xdpdecap                                   62648    5577
17530                   25723                     96
xdpdecap                                   62648    5577
17530                   25723                     96
xdpdecap                                   58507    4687
16527                   18674                     54
syar_lsm_file_open                        151813    1667
11530                   21841                    215
tcdecapstats                               38691    3112
10991                   12373                     35
twfw_connect6                              44399    1974
9864                   63516                   4378
twfw_sendmsg6                              44399    1974
9864                   63516                   4378
edgewall                                   55783    3999
8467                   31874                    187
on_pytorch_event                          102649    2152
7140                   33563                    250

CILIUM BEFORE
=============
[vmuser@archvm bpf]$ sudo veristat -e
file,prog,insns,states,jmp_allocs,jmp_total,jmp_peak -R
~/insn-hist-before-results-cilium.csv -s jmp_allocs -n 20
File                Program                           Insns  States
Jumphist allocs  Jumphist total entries  Jumphist peak entries
------------------  --------------------------------  -----  ------
---------------  ----------------------  ---------------------
bpf_xdp.o           tail_lb_ipv6                      80441    3647
         6976                   12471                    105
bpf_xdp.o           tail_lb_ipv4                      39492    2430
         4581                    8117                    105
bpf_host.o          tail_nodeport_nat_egress_ipv4     22460    1469
         2926                    5302                    226
bpf_overlay.o       tail_nodeport_nat_egress_ipv4     22718    1475
         2926                    5285                    227
bpf_host.o          tail_handle_nat_fwd_ipv4          21022    1289
         2498                    4924                    236
bpf_lxc.o           tail_handle_nat_fwd_ipv4          21022    1289
         2498                    4924                    236
bpf_overlay.o       tail_handle_nat_fwd_ipv4          20524    1271
         2465                    4844                    221
bpf_xdp.o           tail_rev_nodeport_lb6             16173    1010
         1934                    3137                     65
bpf_host.o          tail_handle_nat_fwd_ipv6          15433     905
         1802                    3662                    224
bpf_lxc.o           tail_handle_nat_fwd_ipv6          15433     905
         1802                    3662                    224
bpf_xdp.o           tail_handle_nat_fwd_ipv4          12917     875
         1638                    2986                    245
bpf_xdp.o           tail_nodeport_nat_egress_ipv4     13027     868
         1628                    2957                    227
bpf_xdp.o           tail_handle_nat_fwd_ipv6          13515     715
         1391                    2492                    215
bpf_xdp.o           tail_nodeport_nat_ingress_ipv4     7617     522
          985                    1736                     63
bpf_host.o          cil_to_netdev                      6047     362
          783                    1334                     48
bpf_xdp.o           tail_rev_nodeport_lb4              6808     403
          761                    1319                     77
bpf_xdp.o           tail_nodeport_nat_ingress_ipv6     7575     383
          722                    1261                     63
bpf_host.o          tail_nodeport_nat_ingress_ipv4     5526     366
          693                    1196                     40
bpf_lxc.o           tail_nodeport_nat_ingress_ipv4     5526     366
          693                    1196                     40
bpf_overlay.o       tail_nodeport_nat_ingress_ipv4     5526     366
          693                    1196                     40

CILIUM AFTER
============
[vmuser@archvm bpf]$ sudo veristat -e
file,prog,insns,states,jmp_allocs,jmp_total,jmp_peak -R
~/insn-hist-after-results-cilium.csv -s jmp_allocs -n 20
File                Program                           Insns  States
Jumphist allocs  Jumphist total entries  Jumphist peak entries
------------------  --------------------------------  -----  ------
---------------  ----------------------  ---------------------
bpf_xdp.o           tail_lb_ipv6                      78058    3523
         6810                   19903                    192
bpf_xdp.o           tail_lb_ipv4                      36367    2251
         4293                   11975                    205
bpf_host.o          tail_nodeport_nat_egress_ipv4     19862    1293
         2568                    7021                    305
bpf_overlay.o       tail_nodeport_nat_egress_ipv4     19490    1275
         2552                    6990                    303
bpf_xdp.o           tail_rev_nodeport_lb6             15847     990
         1909                    4533                     96
bpf_xdp.o           tail_handle_nat_fwd_ipv4          12443     849
         1608                    4177                    315
bpf_xdp.o           tail_nodeport_nat_egress_ipv4     12096     809
         1523                    4140                    290
bpf_xdp.o           tail_handle_nat_fwd_ipv6          13264     702
         1378                    3535                    271
bpf_host.o          tail_handle_nat_fwd_ipv4          10479     670
         1325                    3891                    315
bpf_lxc.o           tail_handle_nat_fwd_ipv4          10479     670
         1325                    3891                    315
bpf_host.o          tail_handle_nat_fwd_ipv6          11375     643
         1292                    3523                    288
bpf_lxc.o           tail_handle_nat_fwd_ipv6          11375     643
         1292                    3523                    288
bpf_overlay.o       tail_handle_nat_fwd_ipv4          10114     638
         1266                    3738                    274
bpf_xdp.o           tail_nodeport_nat_ingress_ipv4     5900     413
          773                    1891                     92
bpf_xdp.o           tail_rev_nodeport_lb4              6739     396
          750                    1899                    136
bpf_xdp.o           tail_nodeport_nat_ingress_ipv6     7395     374
          711                    1732                     97
bpf_host.o          cil_to_netdev                      4578     249
          512                    1223                     97
bpf_host.o          tail_handle_ipv6_from_host         4168     244
          499                    1338                     91
bpf_host.o          tail_handle_ipv4_from_host         3434     231
          477                    1170                     97
bpf_host.o          tail_nodeport_nat_ingress_ipv4     3534     243
          474                    1344                     77

SELFTESTS BEFORE
================
[vmuser@archvm bpf]$ sudo veristat -e
file,prog,insns,states,jmp_allocs,jmp_total,jmp_peak -R
~/insn-hist-before-results-selftests.csv -s jmp_allocs -n 20
File                                       Program
     Insns  States  Jumphist allocs  Jumphist total entries  Jumphist
peak entries
-----------------------------------------
-----------------------------  -------  ------  ---------------
----------------------  ---------------------
pyperf600_nounroll.bpf.linked3.o           on_event
    533132   34227            67332                  201368
      15100
pyperf600.bpf.linked3.o                    on_event
    475837   22259            48488                  125533
       9675
verifier_loops1.bpf.linked3.o
loop_after_a_conditional_jump  1000001   25000            25000
          499983                 499999
strobemeta.bpf.linked3.o                   on_event
    180697    4780            20185                  115993
       9208
pyperf180.bpf.linked3.o                    on_event
    118245    8422            17797                   36579
       2881
test_verif_scale1.bpf.linked3.o            balancer_ingress
    546742    8636            16439                   43048
        270
test_verif_scale3.bpf.linked3.o            balancer_ingress
    837487    8636            16439                   43048
        270
xdp_synproxy_kern.bpf.linked3.o            syncookie_xdp
     85116    5162            15308                   30910
         65
xdp_synproxy_kern.bpf.linked3.o            syncookie_tc
     82848    5107            15239                   30812
         66
strobemeta_nounroll2.bpf.linked3.o         on_event
    104119    3820            12128                   72765
       3388
test_cls_redirect.bpf.linked3.o            cls_redirect
     65594    4230            11683                   18353
         50
pyperf100.bpf.linked3.o                    on_event
     72685    5123            11208                   23467
       1630
test_cls_redirect_subprogs.bpf.linked3.o   cls_redirect
     57790    4063             9711                   17719
         93
loop3.bpf.linked3.o                        while_true
   1000001    9663             9663                  111106
     111111
test_verif_scale2.bpf.linked3.o            balancer_ingress
    767498    3048             9144                   21812
         90
strobemeta_subprogs.bpf.linked3.o          on_event
     52685    1653             5890                   40180
       1636
pyperf50.bpf.linked3.o                     on_event
     36980    2623             5708                   11967
        880
strobemeta_nounroll1.bpf.linked3.o         on_event
     49337    1706             5522                   32940
       1552
loop1.bpf.linked3.o                        nested_loops
    361349    5504             5504                   90288
      90300
pyperf_subprogs.bpf.linked3.o              on_event
     36029    2526             5425                   11195
        890

SELFTESTS AFTER
===============
[vmuser@archvm bpf]$ sudo veristat -e
file,prog,insns,states,jmp_allocs,jmp_total,jmp_peak -R
~/insn-hist-after-results-selftests.csv -s jmp_allocs -n 20
File                                       Program
      Insns  States  Jumphist allocs  Jumphist total entries  Jumphist
peak entries
-----------------------------------------
------------------------------  -------  ------  ---------------
----------------------  ---------------------
pyperf600_nounroll.bpf.linked3.o           on_event
     533132   34227            67332                  260526
       18282
pyperf600.bpf.linked3.o                    on_event
     475837   22259            48488                  183880
       13455
verifier_loops1.bpf.linked3.o
loop_after_a_conditional_jump   1000001   25000            25000
            25001                  25002
strobemeta.bpf.linked3.o                   on_event
     176036    4734            19835                  147680
       13666
pyperf180.bpf.linked3.o                    on_event
     118245    8422            17797                   50683
        3763
test_verif_scale1.bpf.linked3.o            balancer_ingress
     546742    8636            16439                   43048
         270
test_verif_scale3.bpf.linked3.o            balancer_ingress
     837487    8636            16439                   43048
         270
xdp_synproxy_kern.bpf.linked3.o            syncookie_tc
      81241    5155            15347                   46763
         148
xdp_synproxy_kern.bpf.linked3.o            syncookie_xdp
      82297    5157            15321                   49715
         148
strobemeta_nounroll2.bpf.linked3.o         on_event
     104119    3820            12128                   80924
        3724
test_cls_redirect.bpf.linked3.o            cls_redirect
      65401    4212            11662                   24862
          79
pyperf100.bpf.linked3.o                    on_event
      72685    5123            11208                   36290
        3201
test_cls_redirect_subprogs.bpf.linked3.o   cls_redirect
      57790    4063             9711                   24814
         146
loop3.bpf.linked3.o                        while_true
    1000001    9663             9663                  333321
      333335
test_verif_scale2.bpf.linked3.o            balancer_ingress
     767498    3048             9144                   28787
         180
strobemeta_subprogs.bpf.linked3.o          on_event
      52685    1653             5890                   45798
        1776
pyperf50.bpf.linked3.o                     on_event
      36980    2623             5708                   18175
        1691
strobemeta_nounroll1.bpf.linked3.o         on_event
      49337    1706             5522                   38476
        1718
loop1.bpf.linked3.o                        nested_loops
     361349    5504             5504                   90288
       90300
pyperf_subprogs.bpf.linked3.o              on_event
      36029    2526             5425                   18130
        1885


Stats counting diff:

$ git show -- kernel
commit febebc9586c08820fa927b1628454b2709e98e3f (HEAD)
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Nov 9 11:02:40 2023 -0800

    [EXPERIMENT] bpf: add jump/insns history stats

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b688043e5460..d0f25f36221e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2026,6 +2026,10 @@ static int pop_stack(struct bpf_verifier_env
*env, int *prev_insn_idx,
                return -ENOENT;

        if (cur) {
+               env->jmp_hist_peak = max(env->jmp_hist_peak,
cur->insn_hist_end);
+               env->jmp_hist_total += cur->insn_hist_end -
cur->insn_hist_start;
+               env->jmp_hist_allocs += 1;
+
                err = copy_verifier_state(cur, &head->st);
                if (err)
                        return err;
@@ -3648,6 +3653,8 @@ static int push_jmp_history(struct bpf_verifier_env *env,
        p->idx = env->insn_idx;
        p->prev_idx = env->prev_insn_idx;
        cur->insn_hist_end++;
+
+       env->jmp_hist_peak = max(env->jmp_hist_peak, cur->insn_hist_end);
        return 0;
 }

@@ -17205,6 +17212,9 @@ static int is_state_visited(struct
bpf_verifier_env *env, int insn_idx)
        WARN_ONCE(new->branches != 1,
                  "BUG is_state_visited:branches_to_explore=%d insn
%d\n", new->branches, insn_idx);

+       env->jmp_hist_total += cur->insn_hist_end - cur->insn_hist_start;
+       env->jmp_hist_allocs += 1;
+
        cur->parent = new;
        cur->first_insn_idx = insn_idx;
        cur->insn_hist_start = cur->insn_hist_end;
@@ -20170,10 +20180,12 @@ static void print_verification_stats(struct
bpf_verifier_env *env)
                verbose(env, "\n");
        }
        verbose(env, "processed %d insns (limit %d) max_states_per_insn %d "
-               "total_states %d peak_states %d mark_read %d\n",
+               "total_states %d peak_states %d mark_read %d "
+               "jmp_allocs %d jmp_total %d jmp_peak %d\n",
                env->insn_processed, BPF_COMPLEXITY_LIMIT_INSNS,
                env->max_states_per_insn, env->total_states,
-               env->peak_states, env->longest_mark_read_walk);
+               env->peak_states, env->longest_mark_read_walk,
+               env->jmp_hist_allocs, env->jmp_hist_total, env->jmp_hist_peak);
 }

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
  2023-11-09 19:49           ` Andrii Nakryiko
@ 2023-11-09 20:39             ` Andrii Nakryiko
  2023-11-09 22:05               ` Alexei Starovoitov
  0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 20:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kernel Team

On Thu, Nov 9, 2023 at 11:49 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 11:29 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > >
> > > If we ever break DFS property, we can easily change this. Or we can
> > > even have a hybrid: as long as traversal preserves DFS property, we
> > > use global shared history, but we can also optionally clone and have
> > > our own history if necessary. It's a matter of adding optional
> > > potentially NULL pointer to "local history". All this is very nicely
> > > hidden away from "normal" code.
> >
> > If we can "easily change this" then let's make it last and optional patch.
> > So we can revert in the future when we need to take non-DFS path.
>
> Ok, sounds good. I'll reorder and put it last, you can decide whether
> to apply it or not that way.
>
> >
> > > But again, let's look at data first. I'll get back with numbers soon.
> >
> > Sure. I think memory increase due to more tracking is ok.
> > I suspect it won't cause 2x increase. Likely few %.
> > The last time I checked the main memory hog is states stashed for pruning.
>
> So I'm back with data. See verifier.c changes I did at the bottom,
> just to double check I'm not missing something major. I count the
> number of allocations (but that's an underestimate that doesn't take
> into account realloc), total number of instruction history entries for
> entire program verification, and then also peak "depth" of instruction
> history. Note that entries should be multiplied by 8 to get the amount
> of bytes (and that's not counting per-allocation overhead).
>
> Here are top 20 results, sorted by number of allocs for Meta-internal,
> Cilium, and selftests. BEFORE is without added STACK_ACCESS tracking
> and STACK_ZERO optimization. AFTER is with all the patches of this
> patch set applied.
>
> It's a few megabytes of memory allocation, which in itself is probably
> not a big deal. But it's just an amount of unnecessary memory
> allocations which is basically at least 2x of the total number of
> states that we can save. And instead have just a few reallocs to size
> global jump history to an order of magnitudes smaller peak entries.
>
> And if we ever decide to track more stuff similar to
> INSNS_F_STACK_ACCESS, we won't have to worry about more allocations or
> more memory usage, because the absolute worst case is our global
> history will be up to 1 million entries tops. We can track some *code
> path dependent* per-instruction information for *each simulated
> instruction* easily without having to think twice about this. Which I
> think is a nice liberating thought in itself justifying this change.
>
>

Gmail butchered tables. See Github gist ([0]) for it properly formatted.

  [0] https://gist.github.com/anakryiko/04c5a3a5ae4ee672bd11d4b7b3d832f5

>
> Stats counting diff:
>
> $ git show -- kernel
> commit febebc9586c08820fa927b1628454b2709e98e3f (HEAD)
> Author: Andrii Nakryiko <andrii@kernel.org>
> Date:   Thu Nov 9 11:02:40 2023 -0800
>
>     [EXPERIMENT] bpf: add jump/insns history stats
>
>     Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index b688043e5460..d0f25f36221e 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2026,6 +2026,10 @@ static int pop_stack(struct bpf_verifier_env
> *env, int *prev_insn_idx,
>                 return -ENOENT;
>
>         if (cur) {
> +               env->jmp_hist_peak = max(env->jmp_hist_peak,
> cur->insn_hist_end);
> +               env->jmp_hist_total += cur->insn_hist_end -
> cur->insn_hist_start;
> +               env->jmp_hist_allocs += 1;
> +
>                 err = copy_verifier_state(cur, &head->st);
>                 if (err)
>                         return err;
> @@ -3648,6 +3653,8 @@ static int push_jmp_history(struct bpf_verifier_env *env,
>         p->idx = env->insn_idx;
>         p->prev_idx = env->prev_insn_idx;
>         cur->insn_hist_end++;
> +
> +       env->jmp_hist_peak = max(env->jmp_hist_peak, cur->insn_hist_end);
>         return 0;
>  }
>
> @@ -17205,6 +17212,9 @@ static int is_state_visited(struct
> bpf_verifier_env *env, int insn_idx)
>         WARN_ONCE(new->branches != 1,
>                   "BUG is_state_visited:branches_to_explore=%d insn
> %d\n", new->branches, insn_idx);
>
> +       env->jmp_hist_total += cur->insn_hist_end - cur->insn_hist_start;
> +       env->jmp_hist_allocs += 1;
> +
>         cur->parent = new;
>         cur->first_insn_idx = insn_idx;
>         cur->insn_hist_start = cur->insn_hist_end;
> @@ -20170,10 +20180,12 @@ static void print_verification_stats(struct
> bpf_verifier_env *env)
>                 verbose(env, "\n");
>         }
>         verbose(env, "processed %d insns (limit %d) max_states_per_insn %d "
> -               "total_states %d peak_states %d mark_read %d\n",
> +               "total_states %d peak_states %d mark_read %d "
> +               "jmp_allocs %d jmp_total %d jmp_peak %d\n",
>                 env->insn_processed, BPF_COMPLEXITY_LIMIT_INSNS,
>                 env->max_states_per_insn, env->total_states,
> -               env->peak_states, env->longest_mark_read_walk);
> +               env->peak_states, env->longest_mark_read_walk,
> +               env->jmp_hist_allocs, env->jmp_hist_total, env->jmp_hist_peak);
>  }

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
  2023-11-09 20:39             ` Andrii Nakryiko
@ 2023-11-09 22:05               ` Alexei Starovoitov
  2023-11-09 22:57                 ` Andrii Nakryiko
  0 siblings, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2023-11-09 22:05 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kernel Team

On Thu, Nov 9, 2023 at 12:39 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 11:49 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Nov 9, 2023 at 11:29 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > >
> > > > If we ever break DFS property, we can easily change this. Or we can
> > > > even have a hybrid: as long as traversal preserves DFS property, we
> > > > use global shared history, but we can also optionally clone and have
> > > > our own history if necessary. It's a matter of adding optional
> > > > potentially NULL pointer to "local history". All this is very nicely
> > > > hidden away from "normal" code.
> > >
> > > If we can "easily change this" then let's make it last and optional patch.
> > > So we can revert in the future when we need to take non-DFS path.
> >
> > Ok, sounds good. I'll reorder and put it last, you can decide whether
> > to apply it or not that way.
> >
> > >
> > > > But again, let's look at data first. I'll get back with numbers soon.
> > >
> > > Sure. I think memory increase due to more tracking is ok.
> > > I suspect it won't cause 2x increase. Likely few %.
> > > The last time I checked the main memory hog is states stashed for pruning.
> >
> > So I'm back with data. See verifier.c changes I did at the bottom,
> > just to double check I'm not missing something major. I count the
> > number of allocations (but that's an underestimate that doesn't take
> > into account realloc), total number of instruction history entries for
> > entire program verification, and then also peak "depth" of instruction
> > history. Note that entries should be multiplied by 8 to get the amount
> > of bytes (and that's not counting per-allocation overhead).
> >
> > Here are top 20 results, sorted by number of allocs for Meta-internal,
> > Cilium, and selftests. BEFORE is without added STACK_ACCESS tracking
> > and STACK_ZERO optimization. AFTER is with all the patches of this
> > patch set applied.
> >
> > It's a few megabytes of memory allocation, which in itself is probably
> > not a big deal. But it's just an amount of unnecessary memory
> > allocations which is basically at least 2x of the total number of
> > states that we can save. And instead have just a few reallocs to size
> > global jump history to an order of magnitudes smaller peak entries.
> >
> > And if we ever decide to track more stuff similar to
> > INSNS_F_STACK_ACCESS, we won't have to worry about more allocations or
> > more memory usage, because the absolute worst case is our global
> > history will be up to 1 million entries tops. We can track some *code
> > path dependent* per-instruction information for *each simulated
> > instruction* easily without having to think twice about this. Which I
> > think is a nice liberating thought in itself justifying this change.
> >
> >
>
> Gmail butchered tables. See Github gist ([0]) for it properly formatted.
>
>   [0] https://gist.github.com/anakryiko/04c5a3a5ae4ee672bd11d4b7b3d832f5

I think 'peak insn history' is the one to look for, since
it indicates total peak memory consumption. Right?
It seems the numbers point out a bug in number collection or
a bug in implementation.

before:
verifier_loops1.bpf.linked3.o peak=499999
loop3.bpf.linked3.o peak=111111

which makes sense, since both tests hit 1m insn.
I can see where 1/2 and 1/9 come from based on asm.

after:
verifier_loops1.bpf.linked3.o peak=25002
loop3.bpf.linked3.o peak=333335

So the 1st test got 20 times smaller memory footprint
while 2nd was 3 times higher.

Both are similar infinite loops.

The 1st one is:
l1_%=:  r0 += 1;                                        \
        goto l1_%=;                                     \

My understanding is that there should be all 500k jmps in history with
or without these patches.

So now I'm more worried about the correctness of the 1st patch.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
  2023-11-09 22:05               ` Alexei Starovoitov
@ 2023-11-09 22:57                 ` Andrii Nakryiko
  2023-11-11  4:29                   ` Andrii Nakryiko
  0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 22:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kernel Team

On Thu, Nov 9, 2023 at 2:06 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 12:39 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Nov 9, 2023 at 11:49 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Thu, Nov 9, 2023 at 11:29 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > >
> > > > > If we ever break DFS property, we can easily change this. Or we can
> > > > > even have a hybrid: as long as traversal preserves DFS property, we
> > > > > use global shared history, but we can also optionally clone and have
> > > > > our own history if necessary. It's a matter of adding optional
> > > > > potentially NULL pointer to "local history". All this is very nicely
> > > > > hidden away from "normal" code.
> > > >
> > > > If we can "easily change this" then let's make it last and optional patch.
> > > > So we can revert in the future when we need to take non-DFS path.
> > >
> > > Ok, sounds good. I'll reorder and put it last, you can decide whether
> > > to apply it or not that way.
> > >
> > > >
> > > > > But again, let's look at data first. I'll get back with numbers soon.
> > > >
> > > > Sure. I think memory increase due to more tracking is ok.
> > > > I suspect it won't cause 2x increase. Likely few %.
> > > > The last time I checked the main memory hog is states stashed for pruning.
> > >
> > > So I'm back with data. See verifier.c changes I did at the bottom,
> > > just to double check I'm not missing something major. I count the
> > > number of allocations (but that's an underestimate that doesn't take
> > > into account realloc), total number of instruction history entries for
> > > entire program verification, and then also peak "depth" of instruction
> > > history. Note that entries should be multiplied by 8 to get the amount
> > > of bytes (and that's not counting per-allocation overhead).
> > >
> > > Here are top 20 results, sorted by number of allocs for Meta-internal,
> > > Cilium, and selftests. BEFORE is without added STACK_ACCESS tracking
> > > and STACK_ZERO optimization. AFTER is with all the patches of this
> > > patch set applied.
> > >
> > > It's a few megabytes of memory allocation, which in itself is probably
> > > not a big deal. But it's just an amount of unnecessary memory
> > > allocations which is basically at least 2x of the total number of
> > > states that we can save. And instead have just a few reallocs to size
> > > global jump history to an order of magnitudes smaller peak entries.
> > >
> > > And if we ever decide to track more stuff similar to
> > > INSNS_F_STACK_ACCESS, we won't have to worry about more allocations or
> > > more memory usage, because the absolute worst case is our global
> > > history will be up to 1 million entries tops. We can track some *code
> > > path dependent* per-instruction information for *each simulated
> > > instruction* easily without having to think twice about this. Which I
> > > think is a nice liberating thought in itself justifying this change.
> > >
> > >
> >
> > Gmail butchered tables. See Github gist ([0]) for it properly formatted.
> >
> >   [0] https://gist.github.com/anakryiko/04c5a3a5ae4ee672bd11d4b7b3d832f5
>
> I think 'peak insn history' is the one to look for, since
> it indicates total peak memory consumption. Right?

Hm... not really? Peak here is the longest sequence of recorded jumps
from root state to any "current". I calculated that to know how big
global history would be necessary.

But it's definitely not a total peak memory consumption, because there
will be states enqueued in a stack still to be processed, and we keep
their jmp_history around. see push_stack() and copy_verifier_state()
we do in that.

> It seems the numbers point out a bug in number collection or
> a bug in implementation.

yeah, but accounting implementation, I suspect. I think I'm not
handling failing states properly.

I'll double check and fix it up, but basically only failing BPF
programs should have bad accounting.

>
> before:
> verifier_loops1.bpf.linked3.o peak=499999
> loop3.bpf.linked3.o peak=111111
>
> which makes sense, since both tests hit 1m insn.
> I can see where 1/2 and 1/9 come from based on asm.
>
> after:
> verifier_loops1.bpf.linked3.o peak=25002
> loop3.bpf.linked3.o peak=333335
>
> So the 1st test got 20 times smaller memory footprint
> while 2nd was 3 times higher.
>
> Both are similar infinite loops.
>
> The 1st one is:
> l1_%=:  r0 += 1;                                        \
>         goto l1_%=;                                     \
>
> My understanding is that there should be all 500k jmps in history with
> or without these patches.
>
> So now I'm more worried about the correctness of the 1st patch.

I'll look closer at what's going on and will report back.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking
  2023-11-09 18:20       ` Eduard Zingerman
@ 2023-11-10  5:48         ` Andrii Nakryiko
  2023-11-12  1:57           ` Andrii Nakryiko
  0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-10  5:48 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team,
	Tao Lyu

On Thu, Nov 9, 2023 at 10:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2023-11-09 at 09:20 -0800, Andrii Nakryiko wrote:
> [...]
> > > >  struct bpf_insn_hist_entry {
> > > > -     u32 prev_idx;
> > > >       u32 idx;
> > > > +     /* insn idx can't be bigger than 1 million */
> > > > +     u32 prev_idx : 22;
> > > > +     /* special flags, e.g., whether insn is doing register stack spill/load */
> > > > +     u32 flags : 10;
> > > >  };
> > >
> > > Nitpick: maybe use separate bit-fields for frameno and spi instead of
> > >          flags? Or add dedicated accessor functions?
> >
> > I wanted to keep it very uniform so that push_insn_history() doesn't
> > know about all such details. It just has "flags". We might use these
> > flags for some other use cases, though if we run out of bits we'll
> > probably just expand bpf_insn_hist_entry and refactor existing code
> > anyways. So, basically, I didn't want to over-engineer this bit too
> > much :)
>
> Well, maybe hide "(hist->flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK"
> behind an accessor?

I'll add a single line helper function just to not be PITA, but I
don't think it's worth it. There are two places we do this, one next
to the other within the same function. This helper is just going to
add mental overhead and won't really help us with anything.

>
> [...]
>
> > > > +static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
> > > > +                          int insn_flags)
> > > >  {
> > > >       struct bpf_insn_hist_entry *p;
> > > >       size_t alloc_size;
> > > >
> > > > -     if (!is_jmp_point(env, env->insn_idx))
> > > > +     /* combine instruction flags if we already recorded this instruction */
> > > > +     if (cur->insn_hist_end > cur->insn_hist_start &&
> > > > +         (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
> > > > +         p->idx == env->insn_idx &&
> > > > +         p->prev_idx == env->prev_insn_idx) {
> > > > +             p->flags |= insn_flags;
> > >
> > > Nitpick: maybe add an assert to check that frameno/spi are not or'ed?
> >
> > ok, something like
> >
> > WARN_ON_ONCE(p->flags & (INSN_F_STACK_ACCESS | INSN_F_FRAMENOMASK |
> > (INSN_F_SPI_MASK << INSN_F_SPI_SHIFT)));
> >
> > ?
>
> Something like this, yes.
>

I added it, and I hate it. It's just a visual noise. Feels too
paranoid, I'll probably drop it.

> [...]
>
> > > > @@ -4713,9 +4711,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> > > >
> > > >               /* Mark slots affected by this stack write. */
> > > >               for (i = 0; i < size; i++)
> > > > -                     state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] =
> > > > -                             type;
> > > > +                     state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = type;
> > > > +             insn_flags = 0; /* not a register spill */
> > > >       }
> > > > +
> > > > +     if (insn_flags)
> > > > +             return push_insn_history(env, env->cur_state, insn_flags);
> > >
> > > Maybe add a check that insn is BPF_ST or BPF_STX here?
> > > Only these cases are supported by backtrack_insn() while
> > > check_mem_access() is called from multiple places.
> >
> > seems like a wrong place to enforce that check_stack_write_fixed_off()
> > is called only for those instructions?
>
> check_stack_write_fixed_off() is called from check_stack_write() which
> is called from check_mem_access() which might trigger
> check_stack_write_fixed_off() when called with BPF_WRITE flag and
> pointer to stack as an argument.
> This happens for ST, STX but also in check_helper_call(),
> process_iter_arg() (maybe other places).
> Speaking of which, should this be handled in backtrack_insn()?

Note that we set insn_flags only for cases where we do an actual
register spill (save_register_state calls for non-fake registers). If
register spill is possible from a helper call somehow, we'll be in
much bigger trouble elsewhere.

>
> > [...]
> >
> > trimming is good
>
> Sigh... sorry, really tried to trim everything today.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
  2023-11-09 22:57                 ` Andrii Nakryiko
@ 2023-11-11  4:29                   ` Andrii Nakryiko
  0 siblings, 0 replies; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-11  4:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kernel Team

On Thu, Nov 9, 2023 at 2:57 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 2:06 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Nov 9, 2023 at 12:39 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Thu, Nov 9, 2023 at 11:49 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Thu, Nov 9, 2023 at 11:29 AM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko
> > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > If we ever break DFS property, we can easily change this. Or we can
> > > > > > even have a hybrid: as long as traversal preserves DFS property, we
> > > > > > use global shared history, but we can also optionally clone and have
> > > > > > our own history if necessary. It's a matter of adding optional
> > > > > > potentially NULL pointer to "local history". All this is very nicely
> > > > > > hidden away from "normal" code.
> > > > >
> > > > > If we can "easily change this" then let's make it last and optional patch.
> > > > > So we can revert in the future when we need to take non-DFS path.
> > > >
> > > > Ok, sounds good. I'll reorder and put it last, you can decide whether
> > > > to apply it or not that way.
> > > >
> > > > >
> > > > > > But again, let's look at data first. I'll get back with numbers soon.
> > > > >
> > > > > Sure. I think memory increase due to more tracking is ok.
> > > > > I suspect it won't cause 2x increase. Likely few %.
> > > > > The last time I checked the main memory hog is states stashed for pruning.
> > > >
> > > > So I'm back with data. See verifier.c changes I did at the bottom,
> > > > just to double check I'm not missing something major. I count the
> > > > number of allocations (but that's an underestimate that doesn't take
> > > > into account realloc), total number of instruction history entries for
> > > > entire program verification, and then also peak "depth" of instruction
> > > > history. Note that entries should be multiplied by 8 to get the amount
> > > > of bytes (and that's not counting per-allocation overhead).
> > > >
> > > > Here are top 20 results, sorted by number of allocs for Meta-internal,
> > > > Cilium, and selftests. BEFORE is without added STACK_ACCESS tracking
> > > > and STACK_ZERO optimization. AFTER is with all the patches of this
> > > > patch set applied.
> > > >
> > > > It's a few megabytes of memory allocation, which in itself is probably
> > > > not a big deal. But it's just an amount of unnecessary memory
> > > > allocations which is basically at least 2x of the total number of
> > > > states that we can save. And instead have just a few reallocs to size
> > > > global jump history to an order of magnitudes smaller peak entries.
> > > >
> > > > And if we ever decide to track more stuff similar to
> > > > INSNS_F_STACK_ACCESS, we won't have to worry about more allocations or
> > > > more memory usage, because the absolute worst case is our global
> > > > history will be up to 1 million entries tops. We can track some *code
> > > > path dependent* per-instruction information for *each simulated
> > > > instruction* easily without having to think twice about this. Which I
> > > > think is a nice liberating thought in itself justifying this change.
> > > >
> > > >
> > >
> > > Gmail butchered tables. See Github gist ([0]) for it properly formatted.
> > >
> > >   [0] https://gist.github.com/anakryiko/04c5a3a5ae4ee672bd11d4b7b3d832f5
> >
> > I think 'peak insn history' is the one to look for, since
> > it indicates total peak memory consumption. Right?
>
> Hm... not really? Peak here is the longest sequence of recorded jumps
> from root state to any "current". I calculated that to know how big
> global history would be necessary.
>
> But it's definitely not a total peak memory consumption, because there
> will be states enqueued in a stack still to be processed, and we keep
> their jmp_history around. see push_stack() and copy_verifier_state()
> we do in that.
>
> > It seems the numbers point out a bug in number collection or
> > a bug in implementation.
>
> yeah, but accounting implementation, I suspect. I think I'm not
> handling failing states properly.
>
> I'll double check and fix it up, but basically only failing BPF
> programs should have bad accounting.

Alexei, your intuition was right! There is indeed a bug in patch #2.
What's funny, it's conceptually the same bug I just fixed in
backtracking logic ([0]). Basically, we cannot rely on checking
instruction indices for equality to make sure it's exactly the same
verified instruction (because it could be the instruction with the
same index, but earlier in verification history). I do also have a bit
of accounting imprecision for last failed state for jmp_total stat,
but I didn't bother fixing it because it's just off by few, and only
for failed validations.

The good news is that this bug actually doesn't affect results at all,
except for that one verifier_loops1.c case (for the same reason why
that bug in [0] wasn't reported earlier, it's a very rare situation
for real-world BPF programs). See updated results in [1].

  [0] https://patchwork.kernel.org/project/netdevbpf/patch/20231110002638.4168352-3-andrii@kernel.org/
  [1] https://gist.github.com/anakryiko/4e61d28f1a2caecea4315e50e4346120

Anyways, the fix is pretty straightforward, if not the most elegant.
I'll roll it into patch #2 for next revision (it will be patch #1,
because I moved common history refactoring to be the last one, as
agreed). Still need to add tests and stuff that Eduard requested.

commit f11d05fb037f6a69eff8c7d4eff4c422374af37f (HEAD -> bpf-verif-jmp-history)
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Fri Nov 10 20:12:05 2023 -0800

    [FIX] remember current insn hist entry to reuse for flags

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 75d9507a8a9f..42a7619dcf80 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -693,6 +693,7 @@ struct bpf_verifier_env {
         */
        char tmp_str_buf[TMP_STR_BUF_LEN];
        struct bpf_insn_hist_entry *insn_hist;
+       struct bpf_insn_hist_entry *cur_hist_ent;
        u32 insn_hist_cap;
 };

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2878077e0a54..eebda0367dca 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3652,13 +3652,8 @@ static int push_insn_history(struct
bpf_verifier_env *env, struct bpf_verifier_s
        size_t alloc_size;

        /* combine instruction flags if we already recorded this instruction */
-       if (cur->insn_hist_end > cur->insn_hist_start &&
-           (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
-           p->idx == env->insn_idx &&
-           p->prev_idx == env->prev_insn_idx) {
-               WARN_ON_ONCE(p->flags & (INSN_F_STACK_ACCESS |
-                            INSN_F_FRAMENO_MASK | (INSN_F_SPI_MASK <<
INSN_F_SPI_SHIFT)));
-               p->flags |= insn_flags;
+       if (env->cur_hist_ent) {
+               env->cur_hist_ent->flags |= insn_flags;
                return 0;
        }

@@ -3676,6 +3671,7 @@ static int push_insn_history(struct
bpf_verifier_env *env, struct bpf_verifier_s
        p->idx = env->insn_idx;
        p->prev_idx = env->prev_insn_idx;
        p->flags = insn_flags;
+       env->cur_hist_ent = p;
        cur->insn_hist_end++;

        env->jmp_hist_peak = max(env->jmp_hist_peak, cur->insn_hist_end);
@@ -17408,6 +17404,9 @@ static int do_check(struct bpf_verifier_env *env)
                u8 class;
                int err;

+               /* reset current history entry on each new instruction */
+               env->cur_hist_ent = NULL;
+
                env->prev_insn_idx = prev_insn_idx;
                if (env->insn_idx >= insn_cnt) {
                        verbose(env, "invalid insn idx %d insn_cnt %d\n",


>
> >
> > before:
> > verifier_loops1.bpf.linked3.o peak=499999
> > loop3.bpf.linked3.o peak=111111
> >
> > which makes sense, since both tests hit 1m insn.
> > I can see where 1/2 and 1/9 come from based on asm.
> >
> > after:
> > verifier_loops1.bpf.linked3.o peak=25002
> > loop3.bpf.linked3.o peak=333335
> >
> > So the 1st test got 20 times smaller memory footprint
> > while 2nd was 3 times higher.
> >
> > Both are similar infinite loops.
> >
> > The 1st one is:
> > l1_%=:  r0 += 1;                                        \
> >         goto l1_%=;                                     \
> >
> > My understanding is that there should be all 500k jmps in history with
> > or without these patches.
> >
> > So now I'm more worried about the correctness of the 1st patch.
>
> I'll look closer at what's going on and will report back.

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking
  2023-11-10  5:48         ` Andrii Nakryiko
@ 2023-11-12  1:57           ` Andrii Nakryiko
  2023-11-12 14:05             ` Eduard Zingerman
  0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-12  1:57 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team,
	Tao Lyu

On Thu, Nov 9, 2023 at 9:48 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 10:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Thu, 2023-11-09 at 09:20 -0800, Andrii Nakryiko wrote:
> > [...]
> > > > >  struct bpf_insn_hist_entry {
> > > > > -     u32 prev_idx;
> > > > >       u32 idx;
> > > > > +     /* insn idx can't be bigger than 1 million */
> > > > > +     u32 prev_idx : 22;
> > > > > +     /* special flags, e.g., whether insn is doing register stack spill/load */
> > > > > +     u32 flags : 10;
> > > > >  };
> > > >
> > > > Nitpick: maybe use separate bit-fields for frameno and spi instead of
> > > >          flags? Or add dedicated accessor functions?
> > >
> > > I wanted to keep it very uniform so that push_insn_history() doesn't
> > > know about all such details. It just has "flags". We might use these
> > > flags for some other use cases, though if we run out of bits we'll
> > > probably just expand bpf_insn_hist_entry and refactor existing code
> > > anyways. So, basically, I didn't want to over-engineer this bit too
> > > much :)
> >
> > Well, maybe hide "(hist->flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK"
> > behind an accessor?
>
> I'll add a single line helper function just to not be PITA, but I
> don't think it's worth it. There are two places we do this, one next
> to the other within the same function. This helper is just going to
> add mental overhead and won't really help us with anything.
>
> >
> > [...]
> >
> > > > > +static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
> > > > > +                          int insn_flags)
> > > > >  {
> > > > >       struct bpf_insn_hist_entry *p;
> > > > >       size_t alloc_size;
> > > > >
> > > > > -     if (!is_jmp_point(env, env->insn_idx))
> > > > > +     /* combine instruction flags if we already recorded this instruction */
> > > > > +     if (cur->insn_hist_end > cur->insn_hist_start &&
> > > > > +         (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
> > > > > +         p->idx == env->insn_idx &&
> > > > > +         p->prev_idx == env->prev_insn_idx) {
> > > > > +             p->flags |= insn_flags;
> > > >
> > > > Nitpick: maybe add an assert to check that frameno/spi are not or'ed?
> > >
> > > ok, something like
> > >
> > > WARN_ON_ONCE(p->flags & (INSN_F_STACK_ACCESS | INSN_F_FRAMENOMASK |
> > > (INSN_F_SPI_MASK << INSN_F_SPI_SHIFT)));
> > >
> > > ?
> >
> > Something like this, yes.
> >
>
> I added it, and I hate it. It's just a visual noise. Feels too
> paranoid, I'll probably drop it.
>

I ended up with these changes on top of this patch:

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 23dbfb5022ba..d234c6f53741 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3167,6 +3167,21 @@ static int check_reg_arg(struct
bpf_verifier_env *env, u32 regno,
        return 0;
 }

+static int insn_stack_access_flags(int frameno, int spi)
+{
+       return INSN_F_STACK_ACCESS | (spi << INSN_F_SPI_SHIFT) | frameno;
+}
+
+static int insn_stack_access_spi(int insn_flags)
+{
+       return (insn_flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK;
+}
+
+static int insn_stack_access_frameno(int insn_flags)
+{
+       return insn_flags & INSN_F_FRAMENO_MASK;
+}
+
 static void mark_jmp_point(struct bpf_verifier_env *env, int idx)
 {
        env->insn_aux_data[idx].jmp_point = true;
@@ -3187,6 +3202,7 @@ static int push_insn_history(struct
bpf_verifier_env *env, struct bpf_verifier_s

        /* combine instruction flags if we already recorded this instruction */
        if (env->cur_hist_ent) {
+               WARN_ON_ONCE(env->cur_hist_ent->flags & insn_flags);
                env->cur_hist_ent->flags |= insn_flags;
                return 0;
        }
@@ -3499,8 +3515,8 @@ static int backtrack_insn(struct
bpf_verifier_env *env, int idx, int subseq_idx,
                 * that [fp - off] slot contains scalar that needs to be
                 * tracked with precision
                 */
-               spi = (hist->flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK;
-               fr = hist->flags & INSN_F_FRAMENO_MASK;
+               spi = insn_stack_access_spi(hist->flags);
+               fr = insn_stack_access_frameno(hist->flags);
                bt_set_frame_slot(bt, fr, spi);
        } else if (class == BPF_STX || class == BPF_ST) {
                if (bt_is_reg_set(bt, dreg))
@@ -3512,8 +3528,8 @@ static int backtrack_insn(struct
bpf_verifier_env *env, int idx, int subseq_idx,
                /* scalars can only be spilled into stack */
                if (!hist || !(hist->flags & INSN_F_STACK_ACCESS))
                        return 0;
-               spi = (hist->flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK;
-               fr = hist->flags & INSN_F_FRAMENO_MASK;
+               spi = insn_stack_access_spi(hist->flags);
+               fr = insn_stack_access_frameno(hist->flags);
                if (!bt_is_frame_slot_set(bt, fr, spi))
                        return 0;
                bt_clear_frame_slot(bt, fr, spi);
@@ -4322,7 +4338,7 @@ static int check_stack_write_fixed_off(struct
bpf_verifier_env *env,
        int i, slot = -off - 1, spi = slot / BPF_REG_SIZE, err;
        struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
        struct bpf_reg_state *reg = NULL;
-       int insn_flags = INSN_F_STACK_ACCESS | (spi <<
INSN_F_SPI_SHIFT) | state->frameno;
+       int insn_flags = insn_stack_access_flags(state->frameno, spi);

        err = grow_stack_state(state, round_up(slot + 1, BPF_REG_SIZE));
        if (err)
@@ -4618,7 +4634,7 @@ static int check_stack_read_fixed_off(struct
bpf_verifier_env *env,
        int i, slot = -off - 1, spi = slot / BPF_REG_SIZE;
        struct bpf_reg_state *reg;
        u8 *stype, type;
-       int insn_flags = INSN_F_STACK_ACCESS | (spi <<
INSN_F_SPI_SHIFT) | reg_state->frameno;
+       int insn_flags = insn_stack_access_flags(reg_state->frameno, spi);

        stype = reg_state->stack[spi].slot_type;
        reg = &reg_state->stack[spi].spilled_ptr;



> > [...]
> >
> > > > > @@ -4713,9 +4711,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> > > > >
> > > > >               /* Mark slots affected by this stack write. */
> > > > >               for (i = 0; i < size; i++)
> > > > > -                     state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] =
> > > > > -                             type;
> > > > > +                     state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = type;
> > > > > +             insn_flags = 0; /* not a register spill */
> > > > >       }
> > > > > +
> > > > > +     if (insn_flags)
> > > > > +             return push_insn_history(env, env->cur_state, insn_flags);
> > > >
> > > > Maybe add a check that insn is BPF_ST or BPF_STX here?
> > > > Only these cases are supported by backtrack_insn() while
> > > > check_mem_access() is called from multiple places.
> > >
> > > seems like a wrong place to enforce that check_stack_write_fixed_off()
> > > is called only for those instructions?
> >
> > check_stack_write_fixed_off() is called from check_stack_write() which
> > is called from check_mem_access() which might trigger
> > check_stack_write_fixed_off() when called with BPF_WRITE flag and
> > pointer to stack as an argument.
> > This happens for ST, STX but also in check_helper_call(),
> > process_iter_arg() (maybe other places).
> > Speaking of which, should this be handled in backtrack_insn()?
>
> Note that we set insn_flags only for cases where we do an actual
> register spill (save_register_state calls for non-fake registers). If
> register spill is possible from a helper call somehow, we'll be in
> much bigger trouble elsewhere.
>
> >
> > > [...]
> > >
> > > trimming is good
> >
> > Sigh... sorry, really tried to trim everything today.

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking
  2023-11-12  1:57           ` Andrii Nakryiko
@ 2023-11-12 14:05             ` Eduard Zingerman
  0 siblings, 0 replies; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-12 14:05 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team,
	Tao Lyu

On Sat, 2023-11-11 at 17:57 -0800, Andrii Nakryiko wrote:
[...]
> I ended up with these changes on top of this patch:
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 23dbfb5022ba..d234c6f53741 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3167,6 +3167,21 @@ static int check_reg_arg(struct
> bpf_verifier_env *env, u32 regno,
>         return 0;
>  }
> 
> +static int insn_stack_access_flags(int frameno, int spi)
> +{
> +       return INSN_F_STACK_ACCESS | (spi << INSN_F_SPI_SHIFT) | frameno;
> +}
> +
> +static int insn_stack_access_spi(int insn_flags)
> +{
> +       return (insn_flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK;
> +}
> +
> +static int insn_stack_access_frameno(int insn_flags)
> +{
> +       return insn_flags & INSN_F_FRAMENO_MASK;
> +}

Looks good, thank you.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2023-11-12 14:05 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-31  5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
2023-10-31  5:03 ` [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states Andrii Nakryiko
2023-11-09 15:20   ` Eduard Zingerman
2023-11-09 16:13     ` Alexei Starovoitov
2023-11-09 17:28       ` Andrii Nakryiko
2023-11-09 19:29         ` Alexei Starovoitov
2023-11-09 19:49           ` Andrii Nakryiko
2023-11-09 20:39             ` Andrii Nakryiko
2023-11-09 22:05               ` Alexei Starovoitov
2023-11-09 22:57                 ` Andrii Nakryiko
2023-11-11  4:29                   ` Andrii Nakryiko
2023-10-31  5:03 ` [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking Andrii Nakryiko
2023-11-09 15:20   ` Eduard Zingerman
2023-11-09 17:20     ` Andrii Nakryiko
2023-11-09 18:20       ` Eduard Zingerman
2023-11-10  5:48         ` Andrii Nakryiko
2023-11-12  1:57           ` Andrii Nakryiko
2023-11-12 14:05             ` Eduard Zingerman
2023-10-31  5:03 ` [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return Andrii Nakryiko
2023-11-09 15:20   ` Eduard Zingerman
2023-11-09 17:32     ` Andrii Nakryiko
2023-11-09 17:38       ` Eduard Zingerman
2023-11-09 17:50         ` Andrii Nakryiko
2023-11-09 17:58           ` Alexei Starovoitov
2023-11-09 18:01             ` Andrii Nakryiko
2023-11-09 18:03               ` Eduard Zingerman
2023-11-09 18:00           ` Eduard Zingerman
2023-10-31  5:03 ` [PATCH bpf-next 4/7] bpf: fix check for attempt to corrupt spilled pointer Andrii Nakryiko
2023-11-09 15:20   ` Eduard Zingerman
2023-10-31  5:03 ` [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills Andrii Nakryiko
2023-11-09 15:20   ` Eduard Zingerman
2023-11-09 17:37     ` Andrii Nakryiko
2023-11-09 17:54       ` Eduard Zingerman
2023-10-31  5:03 ` [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore Andrii Nakryiko
2023-11-09 15:20   ` Eduard Zingerman
2023-11-09 17:41     ` Andrii Nakryiko
2023-11-09 19:34       ` Eduard Zingerman
2023-10-31  5:03 ` [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Andrii Nakryiko
2023-10-31  5:22   ` Andrii Nakryiko
2023-11-01  7:56     ` Jiri Olsa
2023-11-01 16:27       ` Andrii Nakryiko
2023-11-02  9:54         ` Jiri Olsa
2023-11-09 15:21   ` Eduard Zingerman
2023-11-09 17:43     ` Andrii Nakryiko
2023-11-09 17:44       ` Eduard Zingerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox