BPF List
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 00/10] Complete BPF verifier precision tracking support for register spills
@ 2023-11-21  0:22 Andrii Nakryiko
  2023-11-21  0:22 ` [PATCH v2 bpf-next 01/10] bpf: support non-r10 register spill/fill to/from stack in precision tracking Andrii Nakryiko
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2023-11-21  0:22 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

*NOTE* this patch set conflicts with a fix [0] in bpf tree, so this has to
wait until bpf and bpf-next trees converge to be rebased. I'm still submitting
it for early review and discussion.

  [0] https://patchwork.kernel.org/project/netdevbpf/patch/20231110002638.4168352-3-andrii@kernel.org/

Add support to BPF verifier to track and support register spill/fill to/from
stack regardless if it was done through read-only R10 register (which is the
only form supported today), or through a general register after copying R10
into it, while also potentially modifying offset.

Once we add register this generic spill/fill support to precision
backtracking, we can take advantage of it to stop doing eager STACK_ZERO
conversion on register spill. Instead we can rely on (im)precision of spilled
const zero register to improve verifier state pruning efficiency. This
situation of using const zero register to initialize stack slots is very
common with __builtin_memset() usage or just zero-initializing variables on
the stack, and it causes unnecessary state duplication, as that STACK_ZERO
knowledge is often not necessary for correctness, as those zero values are
never used in precise context. Thus, relying on register imprecision helps
tremendously, especially in real-world BPF programs.

To make spilled const zero register behave completely equivalently to
STACK_ZERO, we need to improve few other small pieces, which is done in the
second part of the patch set. See individual patches for details. There are
also two small bug fixes spotted during STACK_ZERO debugging.

The patch set consists of logically three changes:
  - patch #1 (and corresponding tests in patch #2) is fixing/impoving precision
    propagation for stack spills/fills. This can be landed as a stand-alone
    improvement;
  - patches #3 through #9 is improving verification scalability by utilizing
    register (im)precision instead of eager STACK_ZERO. These changes depend
    on patch #1.
  - patch #10 is a memory efficiency improvement to how instruction/jump
    history is tracked and maintained. It depends on patch #1, but is not
    strictly speaking required, even though I believe it's a good long-term
    solution to have a path-dependent per-instruction information. Kind
    of like a path-dependent counterpart to path-agnostic insn_aux array.

v1->v2:
  - clean ups, WARN_ONCE(), insn_flags helpers added (Eduard);
  - added more selftests for STACK_ZERO/STACK_MISC cases (Eduard);
  - a bit more detailed explanation of effect of avoiding STACK_ZERO in favor
    of register spill in patch #8 commit (Alexei);
  - global shared instruction history refactoring moved to be the last patch
    in the series to make it easier to revert it, if applied (Alexei).

Andrii Nakryiko (10):
  bpf: support non-r10 register spill/fill to/from stack in precision
    tracking
  selftests/bpf: add stack access precision test
  bpf: fix check for attempt to corrupt spilled pointer
  bpf: preserve STACK_ZERO slots on partial reg spills
  selftests/bpf: validate STACK_ZERO is preserved on subreg spill
  bpf: preserve constant zero when doing partial register restore
  selftests/bpf: validate zero preservation for sub-slot loads
  bpf: track aligned STACK_ZERO cases as imprecise spilled registers
  selftests/bpf: validate precision logic in
    partial_stack_load_preserves_zeros
  bpf: use common instruction history across all states

 include/linux/bpf_verifier.h                  |  42 ++-
 kernel/bpf/verifier.c                         | 294 +++++++++++-------
 .../selftests/bpf/progs/verifier_spill_fill.c | 113 +++++++
 .../bpf/progs/verifier_subprog_precision.c    |  87 +++++-
 .../testing/selftests/bpf/verifier/precise.c  |  38 ++-
 5 files changed, 423 insertions(+), 151 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 bpf-next 01/10] bpf: support non-r10 register spill/fill to/from stack in precision tracking
  2023-11-21  0:22 [PATCH v2 bpf-next 00/10] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
@ 2023-11-21  0:22 ` Andrii Nakryiko
  2023-11-21  0:22 ` [PATCH v2 bpf-next 02/10] selftests/bpf: add stack access precision test Andrii Nakryiko
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2023-11-21  0:22 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau
  Cc: andrii, kernel-team, Eduard Zingerman, Tao Lyu

Use instruction (jump) history to record instructions that performed
register spill/fill to/from stack, regardless if this was done through
read-only r10 register, or any other register after copying r10 into it
*and* potentially adjusting offset.

To make this work reliably, we push extra per-instruction flags into
instruction history, encoding stack slot index (spi) and stack frame
number in extra 10 bit flags we take away from prev_idx in instruction
history. We don't touch idx field for maximum performance, as it's
checked most frequently during backtracking.

This change removes basically the last remaining practical limitation of
precision backtracking logic in BPF verifier. It fixes known
deficiencies, but also opens up new opportunities to reduce number of
verified states, explored in the subsequent patches.

There are only three differences in selftests' BPF object files
according to veristat, all in the positive direction (less states).

File                                    Program        Insns (A)  Insns (B)  Insns  (DIFF)  States (A)  States (B)  States (DIFF)
--------------------------------------  -------------  ---------  ---------  -------------  ----------  ----------  -------------
test_cls_redirect_dynptr.bpf.linked3.o  cls_redirect        2987       2864  -123 (-4.12%)         240         231    -9 (-3.75%)
xdp_synproxy_kern.bpf.linked3.o         syncookie_tc       82848      82661  -187 (-0.23%)        5107        5073   -34 (-0.67%)
xdp_synproxy_kern.bpf.linked3.o         syncookie_xdp      85116      84964  -152 (-0.18%)        5162        5130   -32 (-0.62%)

Note, I avoided renaming jmp_history to more generic insn_hist to
minimize number of lines changed and potential merge conflicts between
bpf and bpf-next trees.

Notice also cur_hist_entry pointer reset to NULL at the beginning of
instruction verification loop. This pointer avoids the problem of
relying on last jump history entry's insn_idx to determine whether we
already have entry for current instruction or not. It can happen that we
added jump history entry because current instruction is_jmp_point(), but
also we need to add instruction flags for stack access. In this case, we
don't want to entries, so we need to reuse last added entry, if it is
present.

Relying on insn_idx comparison has the same ambiguity problem as the one
that was fixed recently in [0], so we avoid that.

  [0] https://patchwork.kernel.org/project/netdevbpf/patch/20231110002638.4168352-3-andrii@kernel.org/

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reported-by: Tao Lyu <tao.lyu@epfl.ch>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf_verifier.h                  |  31 +++-
 kernel/bpf/verifier.c                         | 175 ++++++++++--------
 .../bpf/progs/verifier_subprog_precision.c    |  23 ++-
 .../testing/selftests/bpf/verifier/precise.c  |  38 ++--
 4 files changed, 169 insertions(+), 98 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 39edc76f436e..1901056ee61c 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -309,12 +309,34 @@ struct bpf_func_state {
 	struct bpf_stack_state *stack;
 };
 
-struct bpf_idx_pair {
-	u32 prev_idx;
+#define MAX_CALL_FRAMES 8
+
+/* instruction history flags, used in bpf_jmp_history_entry.flags field */
+enum {
+	/* instruction references stack slot through PTR_TO_STACK register;
+	 * we also store stack's frame number in lower 3 bits (MAX_CALL_FRAMES is 8)
+	 * and accessed stack slot's index in next 6 bits (MAX_BPF_STACK is 512,
+	 * 8 bytes per slot, so slot index (spi) is [0, 63])
+	 */
+	INSN_F_FRAMENO_MASK = 0x7, /* 3 bits */
+
+	INSN_F_SPI_MASK = 0x3f, /* 6 bits */
+	INSN_F_SPI_SHIFT = 3, /* shifted 3 bits to the left */
+
+	INSN_F_STACK_ACCESS = BIT(9), /* we need 10 bits total */
+};
+
+static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES);
+static_assert(INSN_F_SPI_MASK + 1 >= MAX_BPF_STACK / 8);
+
+struct bpf_jmp_history_entry {
 	u32 idx;
+	/* insn idx can't be bigger than 1 million */
+	u32 prev_idx : 22;
+	/* special flags, e.g., whether insn is doing register stack spill/load */
+	u32 flags : 10;
 };
 
-#define MAX_CALL_FRAMES 8
 /* Maximum number of register states that can exist at once */
 #define BPF_ID_MAP_SIZE ((MAX_BPF_REG + MAX_BPF_STACK / BPF_REG_SIZE) * MAX_CALL_FRAMES)
 struct bpf_verifier_state {
@@ -397,7 +419,7 @@ struct bpf_verifier_state {
 	 * For most states jmp_history_cnt is [0-3].
 	 * For loops can go up to ~40.
 	 */
-	struct bpf_idx_pair *jmp_history;
+	struct bpf_jmp_history_entry *jmp_history;
 	u32 jmp_history_cnt;
 	u32 dfs_depth;
 };
@@ -635,6 +657,7 @@ struct bpf_verifier_env {
 		int cur_stack;
 	} cfg;
 	struct backtrack_state bt;
+	struct bpf_jmp_history_entry *cur_hist_ent;
 	u32 pass_cnt; /* number of times do_check() was called */
 	u32 subprog_cnt;
 	/* number of instructions analyzed by the verifier */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8c2d31aa3d31..c22b557fe30f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1320,8 +1320,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	int i, err;
 
 	dst_state->jmp_history = copy_array(dst_state->jmp_history, src->jmp_history,
-					    src->jmp_history_cnt, sizeof(struct bpf_idx_pair),
-					    GFP_USER);
+					  src->jmp_history_cnt, sizeof(*dst_state->jmp_history),
+					  GFP_USER);
 	if (!dst_state->jmp_history)
 		return -ENOMEM;
 	dst_state->jmp_history_cnt = src->jmp_history_cnt;
@@ -3173,6 +3173,21 @@ static int check_reg_arg(struct bpf_verifier_env *env, u32 regno,
 	return 0;
 }
 
+static int insn_stack_access_flags(int frameno, int spi)
+{
+	return INSN_F_STACK_ACCESS | (spi << INSN_F_SPI_SHIFT) | frameno;
+}
+
+static int insn_stack_access_spi(int insn_flags)
+{
+	return (insn_flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK;
+}
+
+static int insn_stack_access_frameno(int insn_flags)
+{
+	return insn_flags & INSN_F_FRAMENO_MASK;
+}
+
 static void mark_jmp_point(struct bpf_verifier_env *env, int idx)
 {
 	env->insn_aux_data[idx].jmp_point = true;
@@ -3184,28 +3199,51 @@ static bool is_jmp_point(struct bpf_verifier_env *env, int insn_idx)
 }
 
 /* for any branch, call, exit record the history of jmps in the given state */
-static int push_jmp_history(struct bpf_verifier_env *env,
-			    struct bpf_verifier_state *cur)
+static int push_jmp_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
+			    int insn_flags)
 {
 	u32 cnt = cur->jmp_history_cnt;
-	struct bpf_idx_pair *p;
+	struct bpf_jmp_history_entry *p;
 	size_t alloc_size;
 
-	if (!is_jmp_point(env, env->insn_idx))
+	/* combine instruction flags if we already recorded this instruction */
+	if (env->cur_hist_ent) {
+		/* atomic instructions push insn_flags twice, for READ and
+		 * WRITE sides, but they should agree on stack slot
+		 */
+		WARN_ONCE((env->cur_hist_ent->flags & insn_flags) &&
+			  (env->cur_hist_ent->flags & insn_flags) != insn_flags,
+			  "verifier insn history bug: insn_idx %d cur flags %x new flags %x\n",
+			  env->insn_idx, env->cur_hist_ent->flags, insn_flags);
+		env->cur_hist_ent->flags |= insn_flags;
 		return 0;
+	}
 
 	cnt++;
 	alloc_size = kmalloc_size_roundup(size_mul(cnt, sizeof(*p)));
 	p = krealloc(cur->jmp_history, alloc_size, GFP_USER);
 	if (!p)
 		return -ENOMEM;
-	p[cnt - 1].idx = env->insn_idx;
-	p[cnt - 1].prev_idx = env->prev_insn_idx;
 	cur->jmp_history = p;
+
+	p = &cur->jmp_history[cnt - 1];
+	p->idx = env->insn_idx;
+	p->prev_idx = env->prev_insn_idx;
+	p->flags = insn_flags;
 	cur->jmp_history_cnt = cnt;
+	env->cur_hist_ent = p;
+
 	return 0;
 }
 
+static struct bpf_jmp_history_entry *get_jmp_hist_entry(struct bpf_verifier_state *st,
+						        u32 hist_end, int insn_idx)
+{
+	if (hist_end > 0 && st->jmp_history[hist_end - 1].idx == insn_idx)
+		return &st->jmp_history[hist_end - 1];
+	return NULL;
+}
+
 /* Backtrack one insn at a time. If idx is not at the top of recorded
  * history then previous instruction came from straight line execution.
  */
@@ -3350,9 +3388,14 @@ static inline bool bt_is_reg_set(struct backtrack_state *bt, u32 reg)
 	return bt->reg_masks[bt->frame] & (1 << reg);
 }
 
+static inline bool bt_is_frame_slot_set(struct backtrack_state *bt, u32 frame, u32 slot)
+{
+	return bt->stack_masks[frame] & (1ull << slot);
+}
+
 static inline bool bt_is_slot_set(struct backtrack_state *bt, u32 slot)
 {
-	return bt->stack_masks[bt->frame] & (1ull << slot);
+	return bt_is_frame_slot_set(bt, bt->frame, slot);
 }
 
 /* format registers bitmask, e.g., "r0,r2,r4" for 0x15 mask */
@@ -3404,7 +3447,7 @@ static void fmt_stack_mask(char *buf, ssize_t buf_sz, u64 stack_mask)
  *   - *was* processed previously during backtracking.
  */
 static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
-			  struct backtrack_state *bt)
+			  struct bpf_jmp_history_entry *hist, struct backtrack_state *bt)
 {
 	const struct bpf_insn_cbs cbs = {
 		.cb_call	= disasm_kfunc_name,
@@ -3417,7 +3460,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 	u8 mode = BPF_MODE(insn->code);
 	u32 dreg = insn->dst_reg;
 	u32 sreg = insn->src_reg;
-	u32 spi, i;
+	u32 spi, i, fr;
 
 	if (insn->code == 0)
 		return 0;
@@ -3478,20 +3521,15 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 		 * by 'precise' mark in corresponding register of this state.
 		 * No further tracking necessary.
 		 */
-		if (insn->src_reg != BPF_REG_FP)
+		if (!hist || !(hist->flags & INSN_F_STACK_ACCESS))
 			return 0;
-
 		/* dreg = *(u64 *)[fp - off] was a fill from the stack.
 		 * that [fp - off] slot contains scalar that needs to be
 		 * tracked with precision
 		 */
-		spi = (-insn->off - 1) / BPF_REG_SIZE;
-		if (spi >= 64) {
-			verbose(env, "BUG spi %d\n", spi);
-			WARN_ONCE(1, "verifier backtracking bug");
-			return -EFAULT;
-		}
-		bt_set_slot(bt, spi);
+		spi = insn_stack_access_spi(hist->flags);
+		fr = insn_stack_access_frameno(hist->flags);
+		bt_set_frame_slot(bt, fr, spi);
 	} else if (class == BPF_STX || class == BPF_ST) {
 		if (bt_is_reg_set(bt, dreg))
 			/* stx & st shouldn't be using _scalar_ dst_reg
@@ -3500,17 +3538,13 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 			 */
 			return -ENOTSUPP;
 		/* scalars can only be spilled into stack */
-		if (insn->dst_reg != BPF_REG_FP)
+		if (!hist || !(hist->flags & INSN_F_STACK_ACCESS))
 			return 0;
-		spi = (-insn->off - 1) / BPF_REG_SIZE;
-		if (spi >= 64) {
-			verbose(env, "BUG spi %d\n", spi);
-			WARN_ONCE(1, "verifier backtracking bug");
-			return -EFAULT;
-		}
-		if (!bt_is_slot_set(bt, spi))
+		spi = insn_stack_access_spi(hist->flags);
+		fr = insn_stack_access_frameno(hist->flags);
+		if (!bt_is_frame_slot_set(bt, fr, spi))
 			return 0;
-		bt_clear_slot(bt, spi);
+		bt_clear_frame_slot(bt, fr, spi);
 		if (class == BPF_STX)
 			bt_set_reg(bt, sreg);
 	} else if (class == BPF_JMP || class == BPF_JMP32) {
@@ -3554,10 +3588,14 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 					WARN_ONCE(1, "verifier backtracking bug");
 					return -EFAULT;
 				}
-				/* we don't track register spills perfectly,
-				 * so fallback to force-precise instead of failing */
-				if (bt_stack_mask(bt) != 0)
-					return -ENOTSUPP;
+				/* we are now tracking register spills correctly,
+				 * so any instance of leftover slots is a bug
+				 */
+				if (bt_stack_mask(bt) != 0) {
+					verbose(env, "BUG stack slots %llx\n", bt_stack_mask(bt));
+					WARN_ONCE(1, "verifier backtracking bug (subprog leftover stack slots)");
+					return -EFAULT;
+				}
 				/* propagate r1-r5 to the caller */
 				for (i = BPF_REG_1; i <= BPF_REG_5; i++) {
 					if (bt_is_reg_set(bt, i)) {
@@ -3585,8 +3623,11 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 				WARN_ONCE(1, "verifier backtracking bug");
 				return -EFAULT;
 			}
-			if (bt_stack_mask(bt) != 0)
-				return -ENOTSUPP;
+			if (bt_stack_mask(bt) != 0) {
+				verbose(env, "BUG stack slots %llx\n", bt_stack_mask(bt));
+				WARN_ONCE(1, "verifier backtracking bug (callback leftover stack slots)");
+				return -EFAULT;
+			}
 			/* clear r1-r5 in callback subprog's mask */
 			for (i = BPF_REG_1; i <= BPF_REG_5; i++)
 				bt_clear_reg(bt, i);
@@ -4015,6 +4056,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 	for (;;) {
 		DECLARE_BITMAP(mask, 64);
 		u32 history = st->jmp_history_cnt;
+		struct bpf_jmp_history_entry *hist;
 
 		if (env->log.level & BPF_LOG_LEVEL2) {
 			verbose(env, "mark_precise: frame%d: last_idx %d first_idx %d subseq_idx %d \n",
@@ -4078,7 +4120,8 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 				err = 0;
 				skip_first = false;
 			} else {
-				err = backtrack_insn(env, i, subseq_idx, bt);
+				hist = get_jmp_hist_entry(st, history, i);
+				err = backtrack_insn(env, i, subseq_idx, hist, bt);
 			}
 			if (err == -ENOTSUPP) {
 				mark_all_scalars_precise(env, env->cur_state);
@@ -4131,22 +4174,10 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 			bitmap_from_u64(mask, bt_frame_stack_mask(bt, fr));
 			for_each_set_bit(i, mask, 64) {
 				if (i >= func->allocated_stack / BPF_REG_SIZE) {
-					/* the sequence of instructions:
-					 * 2: (bf) r3 = r10
-					 * 3: (7b) *(u64 *)(r3 -8) = r0
-					 * 4: (79) r4 = *(u64 *)(r10 -8)
-					 * doesn't contain jmps. It's backtracked
-					 * as a single block.
-					 * During backtracking insn 3 is not recognized as
-					 * stack access, so at the end of backtracking
-					 * stack slot fp-8 is still marked in stack_mask.
-					 * However the parent state may not have accessed
-					 * fp-8 and it's "unallocated" stack space.
-					 * In such case fallback to conservative.
-					 */
-					mark_all_scalars_precise(env, env->cur_state);
-					bt_reset(bt);
-					return 0;
+					verbose(env, "BUG backtracking (stack slot %d, total slots %d)\n",
+						i, func->allocated_stack / BPF_REG_SIZE);
+					WARN_ONCE(1, "verifier backtracking bug (stack slot out of bounds)");
+					return -EFAULT;
 				}
 
 				if (!is_spilled_scalar_reg(&func->stack[i])) {
@@ -4319,7 +4350,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 	int i, slot = -off - 1, spi = slot / BPF_REG_SIZE, err;
 	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
 	struct bpf_reg_state *reg = NULL;
-	u32 dst_reg = insn->dst_reg;
+	int insn_flags = insn_stack_access_flags(state->frameno, spi);
 
 	err = grow_stack_state(state, round_up(slot + 1, BPF_REG_SIZE));
 	if (err)
@@ -4360,17 +4391,6 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 	mark_stack_slot_scratched(env, spi);
 	if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
 	    !register_is_null(reg) && env->bpf_capable) {
-		if (dst_reg != BPF_REG_FP) {
-			/* The backtracking logic can only recognize explicit
-			 * stack slot address like [fp - 8]. Other spill of
-			 * scalar via different register has to be conservative.
-			 * Backtrack from here and mark all registers as precise
-			 * that contributed into 'reg' being a constant.
-			 */
-			err = mark_chain_precision(env, value_regno);
-			if (err)
-				return err;
-		}
 		save_register_state(state, spi, reg, size);
 		/* Break the relation on a narrowing spill. */
 		if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
@@ -4382,6 +4402,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 		__mark_reg_known(&fake_reg, insn->imm);
 		fake_reg.type = SCALAR_VALUE;
 		save_register_state(state, spi, &fake_reg, size);
+		insn_flags = 0; /* not a register spill */
 	} else if (reg && is_spillable_regtype(reg->type)) {
 		/* register containing pointer is being spilled into stack */
 		if (size != BPF_REG_SIZE) {
@@ -4427,9 +4448,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 
 		/* Mark slots affected by this stack write. */
 		for (i = 0; i < size; i++)
-			state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] =
-				type;
+			state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = type;
+		insn_flags = 0; /* not a register spill */
 	}
+
+	if (insn_flags)
+		return push_jmp_history(env, env->cur_state, insn_flags);
 	return 0;
 }
 
@@ -4622,6 +4646,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
 	int i, slot = -off - 1, spi = slot / BPF_REG_SIZE;
 	struct bpf_reg_state *reg;
 	u8 *stype, type;
+	int insn_flags = insn_stack_access_flags(reg_state->frameno, spi);
 
 	stype = reg_state->stack[spi].slot_type;
 	reg = &reg_state->stack[spi].spilled_ptr;
@@ -4667,12 +4692,10 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
 					return -EACCES;
 				}
 				mark_reg_unknown(env, state->regs, dst_regno);
+				insn_flags = 0; /* not restoring original register state */
 			}
 			state->regs[dst_regno].live |= REG_LIVE_WRITTEN;
-			return 0;
-		}
-
-		if (dst_regno >= 0) {
+		} else if (dst_regno >= 0) {
 			/* restore register state from stack */
 			copy_register_state(&state->regs[dst_regno], reg);
 			/* mark reg as written since spilled pointer state likely
@@ -4708,7 +4731,10 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
 		mark_reg_read(env, reg, reg->parent, REG_LIVE_READ64);
 		if (dst_regno >= 0)
 			mark_reg_stack_read(env, reg_state, off, off + size, dst_regno);
+		insn_flags = 0; /* we are not restoring spilled register */
 	}
+	if (insn_flags)
+		return push_jmp_history(env, env->cur_state, insn_flags);
 	return 0;
 }
 
@@ -6868,7 +6894,6 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
 			       BPF_SIZE(insn->code), BPF_WRITE, -1, true, false);
 	if (err)
 		return err;
-
 	return 0;
 }
 
@@ -16702,7 +16727,8 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 			 * the precision needs to be propagated back in
 			 * the current state.
 			 */
-			err = err ? : push_jmp_history(env, cur);
+			if (is_jmp_point(env, env->insn_idx))
+				err = err ? : push_jmp_history(env, cur, 0);
 			err = err ? : propagate_precision(env, &sl->state);
 			if (err)
 				return err;
@@ -16927,6 +16953,9 @@ static int do_check(struct bpf_verifier_env *env)
 		u8 class;
 		int err;
 
+		/* reset current history entry on each new instruction */
+		env->cur_hist_ent = NULL;
+
 		env->prev_insn_idx = prev_insn_idx;
 		if (env->insn_idx >= insn_cnt) {
 			verbose(env, "invalid insn idx %d insn_cnt %d\n",
@@ -16966,7 +16995,7 @@ static int do_check(struct bpf_verifier_env *env)
 		}
 
 		if (is_jmp_point(env, env->insn_idx)) {
-			err = push_jmp_history(env, state);
+			err = push_jmp_history(env, state, 0);
 			if (err)
 				return err;
 		}
diff --git a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
index db6b3143338b..9b3844215a36 100644
--- a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
+++ b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
@@ -483,11 +483,24 @@ static __u64 subprog_spill_reg_precise(void)
 
 SEC("?raw_tp")
 __success __log_level(2)
-/* precision backtracking can't currently handle stack access not through r10,
- * so we won't be able to mark stack slot fp-8 as precise, and so will
- * fallback to forcing all as precise
- */
-__msg("mark_precise: frame0: falling back to forcing all scalars precise")
+__msg("10: (0f) r1 += r7")
+__msg("mark_precise: frame0: last_idx 10 first_idx 7 subseq_idx -1")
+__msg("mark_precise: frame0: regs=r7 stack= before 9: (bf) r1 = r8")
+__msg("mark_precise: frame0: regs=r7 stack= before 8: (27) r7 *= 4")
+__msg("mark_precise: frame0: regs=r7 stack= before 7: (79) r7 = *(u64 *)(r10 -8)")
+__msg("mark_precise: frame0: parent state regs= stack=-8:  R0_w=2 R6_w=1 R8_rw=map_value(map=.data.vals,ks=4,vs=16) R10=fp0 fp-8_rw=P1")
+__msg("mark_precise: frame0: last_idx 18 first_idx 0 subseq_idx 7")
+__msg("mark_precise: frame0: regs= stack=-8 before 18: (95) exit")
+__msg("mark_precise: frame1: regs= stack= before 17: (0f) r0 += r2")
+__msg("mark_precise: frame1: regs= stack= before 16: (79) r2 = *(u64 *)(r1 +0)")
+__msg("mark_precise: frame1: regs= stack= before 15: (79) r0 = *(u64 *)(r10 -16)")
+__msg("mark_precise: frame1: regs= stack= before 14: (7b) *(u64 *)(r10 -16) = r2")
+__msg("mark_precise: frame1: regs= stack= before 13: (7b) *(u64 *)(r1 +0) = r2")
+__msg("mark_precise: frame1: regs=r2 stack= before 6: (85) call pc+6")
+__msg("mark_precise: frame0: regs=r2 stack= before 5: (bf) r2 = r6")
+__msg("mark_precise: frame0: regs=r6 stack= before 4: (07) r1 += -8")
+__msg("mark_precise: frame0: regs=r6 stack= before 3: (bf) r1 = r10")
+__msg("mark_precise: frame0: regs=r6 stack= before 2: (b7) r6 = 1")
 __naked int subprog_spill_into_parent_stack_slot_precise(void)
 {
 	asm volatile (
diff --git a/tools/testing/selftests/bpf/verifier/precise.c b/tools/testing/selftests/bpf/verifier/precise.c
index 0d84dd1f38b6..8a2ff81d8350 100644
--- a/tools/testing/selftests/bpf/verifier/precise.c
+++ b/tools/testing/selftests/bpf/verifier/precise.c
@@ -140,10 +140,11 @@
 	.result = REJECT,
 },
 {
-	"precise: ST insn causing spi > allocated_stack",
+	"precise: ST zero to stack insn is supported",
 	.insns = {
 	BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
 	BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 123, 0),
+	/* not a register spill, so we stop precision propagation for R4 here */
 	BPF_ST_MEM(BPF_DW, BPF_REG_3, -8, 0),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8),
 	BPF_MOV64_IMM(BPF_REG_0, -1),
@@ -157,11 +158,11 @@
 	mark_precise: frame0: last_idx 4 first_idx 2\
 	mark_precise: frame0: regs=r4 stack= before 4\
 	mark_precise: frame0: regs=r4 stack= before 3\
-	mark_precise: frame0: regs= stack=-8 before 2\
-	mark_precise: frame0: falling back to forcing all scalars precise\
-	force_precise: frame0: forcing r0 to be precise\
 	mark_precise: frame0: last_idx 5 first_idx 5\
-	mark_precise: frame0: parent state regs= stack=:",
+	mark_precise: frame0: parent state regs=r0 stack=:\
+	mark_precise: frame0: last_idx 4 first_idx 2\
+	mark_precise: frame0: regs=r0 stack= before 4\
+	5: R0=-1 R4=0",
 	.result = VERBOSE_ACCEPT,
 	.retval = -1,
 },
@@ -169,6 +170,8 @@
 	"precise: STX insn causing spi > allocated_stack",
 	.insns = {
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	/* make later reg spill more interesting by having somewhat known scalar */
+	BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 0xff),
 	BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
 	BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 123, 0),
 	BPF_STX_MEM(BPF_DW, BPF_REG_3, BPF_REG_0, -8),
@@ -179,18 +182,21 @@
 	},
 	.prog_type = BPF_PROG_TYPE_XDP,
 	.flags = BPF_F_TEST_STATE_FREQ,
-	.errstr = "mark_precise: frame0: last_idx 6 first_idx 6\
+	.errstr = "mark_precise: frame0: last_idx 7 first_idx 7\
 	mark_precise: frame0: parent state regs=r4 stack=:\
-	mark_precise: frame0: last_idx 5 first_idx 3\
-	mark_precise: frame0: regs=r4 stack= before 5\
-	mark_precise: frame0: regs=r4 stack= before 4\
-	mark_precise: frame0: regs= stack=-8 before 3\
-	mark_precise: frame0: falling back to forcing all scalars precise\
-	force_precise: frame0: forcing r0 to be precise\
-	force_precise: frame0: forcing r0 to be precise\
-	force_precise: frame0: forcing r0 to be precise\
-	force_precise: frame0: forcing r0 to be precise\
-	mark_precise: frame0: last_idx 6 first_idx 6\
+	mark_precise: frame0: last_idx 6 first_idx 4\
+	mark_precise: frame0: regs=r4 stack= before 6: (b7) r0 = -1\
+	mark_precise: frame0: regs=r4 stack= before 5: (79) r4 = *(u64 *)(r10 -8)\
+	mark_precise: frame0: regs= stack=-8 before 4: (7b) *(u64 *)(r3 -8) = r0\
+	mark_precise: frame0: parent state regs=r0 stack=:\
+	mark_precise: frame0: last_idx 3 first_idx 3\
+	mark_precise: frame0: regs=r0 stack= before 3: (55) if r3 != 0x7b goto pc+0\
+	mark_precise: frame0: regs=r0 stack= before 2: (bf) r3 = r10\
+	mark_precise: frame0: regs=r0 stack= before 1: (57) r0 &= 255\
+	mark_precise: frame0: parent state regs=r0 stack=:\
+	mark_precise: frame0: last_idx 0 first_idx 0\
+	mark_precise: frame0: regs=r0 stack= before 0: (85) call bpf_get_prandom_u32#7\
+	mark_precise: frame0: last_idx 7 first_idx 7\
 	mark_precise: frame0: parent state regs= stack=:",
 	.result = VERBOSE_ACCEPT,
 	.retval = -1,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 bpf-next 02/10] selftests/bpf: add stack access precision test
  2023-11-21  0:22 [PATCH v2 bpf-next 00/10] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
  2023-11-21  0:22 ` [PATCH v2 bpf-next 01/10] bpf: support non-r10 register spill/fill to/from stack in precision tracking Andrii Nakryiko
@ 2023-11-21  0:22 ` Andrii Nakryiko
  2023-11-21  0:42   ` Eduard Zingerman
  2023-11-21  0:22 ` [PATCH v2 bpf-next 03/10] bpf: fix check for attempt to corrupt spilled pointer Andrii Nakryiko
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-11-21  0:22 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Add a new selftests that validates precision tracking for stack access
instruction, using both r10-based and non-r10-based accesses. For
non-r10 ones we also make sure to have non-zero var_off to validate that
final stack offset is tracked properly in instruction history
information inside verifier.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../bpf/progs/verifier_subprog_precision.c    | 64 +++++++++++++++++--
 1 file changed, 59 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
index 9b3844215a36..a796d282077a 100644
--- a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
+++ b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
@@ -535,14 +535,68 @@ __naked int subprog_spill_into_parent_stack_slot_precise(void)
 	);
 }
 
-__naked __noinline __used
-static __u64 subprog_with_checkpoint(void)
+SEC("?raw_tp")
+__success __log_level(2)
+__msg("17: (0f) r1 += r0")
+__msg("mark_precise: frame0: last_idx 17 first_idx 0 subseq_idx -1")
+__msg("mark_precise: frame0: regs=r0 stack= before 16: (bf) r1 = r7")
+__msg("mark_precise: frame0: regs=r0 stack= before 15: (27) r0 *= 4")
+__msg("mark_precise: frame0: regs=r0 stack= before 14: (79) r0 = *(u64 *)(r10 -16)")
+__msg("mark_precise: frame0: regs= stack=-16 before 13: (7b) *(u64 *)(r7 -8) = r0")
+__msg("mark_precise: frame0: regs=r0 stack= before 12: (79) r0 = *(u64 *)(r8 +16)")
+__msg("mark_precise: frame0: regs= stack=-16 before 11: (7b) *(u64 *)(r8 +16) = r0")
+__msg("mark_precise: frame0: regs=r0 stack= before 10: (79) r0 = *(u64 *)(r7 -8)")
+__msg("mark_precise: frame0: regs= stack=-16 before 9: (7b) *(u64 *)(r10 -16) = r0")
+__msg("mark_precise: frame0: regs=r0 stack= before 8: (07) r8 += -32")
+__msg("mark_precise: frame0: regs=r0 stack= before 7: (bf) r8 = r10")
+__msg("mark_precise: frame0: regs=r0 stack= before 6: (07) r7 += -8")
+__msg("mark_precise: frame0: regs=r0 stack= before 5: (bf) r7 = r10")
+__msg("mark_precise: frame0: regs=r0 stack= before 21: (95) exit")
+__msg("mark_precise: frame1: regs=r0 stack= before 20: (bf) r0 = r1")
+__msg("mark_precise: frame1: regs=r1 stack= before 4: (85) call pc+15")
+__msg("mark_precise: frame0: regs=r1 stack= before 3: (bf) r1 = r6")
+__msg("mark_precise: frame0: regs=r6 stack= before 2: (b7) r6 = 1")
+__naked int stack_slot_aliases_precision(void)
 {
 	asm volatile (
-		"r0 = 0;"
-		/* guaranteed checkpoint if BPF_F_TEST_STATE_FREQ is used */
-		"goto +0;"
+		"r6 = 1;"
+		/* pass r6 through r1 into subprog to get it back as r0;
+		 * this whole chain will have to be marked as precise later
+		 */
+		"r1 = r6;"
+		"call identity_subprog;"
+		/* let's setup two registers that are aliased to r10 */
+		"r7 = r10;"
+		"r7 += -8;"			/* r7 = r10 - 8 */
+		"r8 = r10;"
+		"r8 += -32;"			/* r8 = r10 - 32 */
+		/* now spill subprog's return value (a r6 -> r1 -> r0 chain)
+		 * a few times through different stack pointer regs, making
+		 * sure to use r10, r7, and r8 both in LDX and STX insns, and
+		 * *importantly* also using a combination of const var_off and
+		 * insn->off to validate that we record final stack slot
+		 * correctly, instead of relying on just insn->off derivation,
+		 * which is only valid for r10-based stack offset
+		 */
+		"*(u64 *)(r10 - 16) = r0;"
+		"r0 = *(u64 *)(r7 - 8);"	/* r7 - 8 == r10 - 16 */
+		"*(u64 *)(r8 + 16) = r0;"	/* r8 + 16 = r10 - 16 */
+		"r0 = *(u64 *)(r8 + 16);"
+		"*(u64 *)(r7 - 8) = r0;"
+		"r0 = *(u64 *)(r10 - 16);"
+		/* get ready to use r0 as an index into array to force precision */
+		"r0 *= 4;"
+		"r1 = %[vals];"
+		/* here r0->r1->r6 chain is forced to be precise and has to be
+		 * propagated back to the beginning, including through the
+		 * subprog call and all the stack spills and loads
+		 */
+		"r1 += r0;"
+		"r0 = *(u32 *)(r1 + 0);"
 		"exit;"
+		:
+		: __imm_ptr(vals)
+		: __clobber_common, "r6"
 	);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 bpf-next 03/10] bpf: fix check for attempt to corrupt spilled pointer
  2023-11-21  0:22 [PATCH v2 bpf-next 00/10] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
  2023-11-21  0:22 ` [PATCH v2 bpf-next 01/10] bpf: support non-r10 register spill/fill to/from stack in precision tracking Andrii Nakryiko
  2023-11-21  0:22 ` [PATCH v2 bpf-next 02/10] selftests/bpf: add stack access precision test Andrii Nakryiko
@ 2023-11-21  0:22 ` Andrii Nakryiko
  2023-11-21  0:22 ` [PATCH v2 bpf-next 04/10] bpf: preserve STACK_ZERO slots on partial reg spills Andrii Nakryiko
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2023-11-21  0:22 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

When register is spilled onto a stack as a 1/2/4-byte register, we set
slot_type[BPF_REG_SIZE - 1] (plus potentially few more below it,
depending on actual spill size). So to check if some stack slot has
spilled register we need to consult slot_type[7], not slot_type[0].

To avoid the need to remember and double-check this in the future, just
use is_spilled_reg() helper.

Fixes: 638f5b90d460 ("bpf: reduce verifier memory consumption")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c22b557fe30f..357feb27e90a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4359,7 +4359,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 	 * so it's aligned access and [off, off + size) are within stack limits
 	 */
 	if (!env->allow_ptr_leaks &&
-	    state->stack[spi].slot_type[0] == STACK_SPILL &&
+	    is_spilled_reg(&state->stack[spi]) &&
 	    size != BPF_REG_SIZE) {
 		verbose(env, "attempt to corrupt spilled pointer on stack\n");
 		return -EACCES;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 bpf-next 04/10] bpf: preserve STACK_ZERO slots on partial reg spills
  2023-11-21  0:22 [PATCH v2 bpf-next 00/10] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2023-11-21  0:22 ` [PATCH v2 bpf-next 03/10] bpf: fix check for attempt to corrupt spilled pointer Andrii Nakryiko
@ 2023-11-21  0:22 ` Andrii Nakryiko
  2023-11-21  0:22 ` [PATCH v2 bpf-next 05/10] selftests/bpf: validate STACK_ZERO is preserved on subreg spill Andrii Nakryiko
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2023-11-21  0:22 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Eduard Zingerman

Instead of always forcing STACK_ZERO slots to STACK_MISC, preserve it in
situations where this is possible. E.g., when spilling register as
1/2/4-byte subslots on the stack, all the remaining bytes in the stack
slot do not automatically become unknown. If we knew they contained
zeroes, we can preserve those STACK_ZERO markers.

Add a helper mark_stack_slot_misc(), similar to scrub_spilled_slot(),
but that doesn't overwrite either STACK_INVALID nor STACK_ZERO. Note
that we need to take into account possibility of being in unprivileged
mode, in which case STACK_INVALID is forced to STACK_MISC for correctness,
as treating STACK_INVALID as equivalent STACK_MISC is only enabled in
privileged mode.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 357feb27e90a..5145afb5da25 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1109,6 +1109,21 @@ static bool is_spilled_scalar_reg(const struct bpf_stack_state *stack)
 	       stack->spilled_ptr.type == SCALAR_VALUE;
 }
 
+/* Mark stack slot as STACK_MISC, unless it is already STACK_INVALID, in which
+ * case they are equivalent, or it's STACK_ZERO, in which case we preserve
+ * more precise STACK_ZERO.
+ * Note, in uprivileged mode leaving STACK_INVALID is wrong, so we take
+ * env->allow_ptr_leaks into account and force STACK_MISC, if necessary.
+ */
+static void mark_stack_slot_misc(struct bpf_verifier_env *env, u8 *stype)
+{
+	if (*stype == STACK_ZERO)
+		return;
+	if (env->allow_ptr_leaks && *stype == STACK_INVALID)
+		return;
+	*stype = STACK_MISC;
+}
+
 static void scrub_spilled_slot(u8 *stype)
 {
 	if (*stype != STACK_INVALID)
@@ -4314,7 +4329,8 @@ static void copy_register_state(struct bpf_reg_state *dst, const struct bpf_reg_
 	dst->live = live;
 }
 
-static void save_register_state(struct bpf_func_state *state,
+static void save_register_state(struct bpf_verifier_env *env,
+				struct bpf_func_state *state,
 				int spi, struct bpf_reg_state *reg,
 				int size)
 {
@@ -4329,7 +4345,7 @@ static void save_register_state(struct bpf_func_state *state,
 
 	/* size < 8 bytes spill */
 	for (; i; i--)
-		scrub_spilled_slot(&state->stack[spi].slot_type[i - 1]);
+		mark_stack_slot_misc(env, &state->stack[spi].slot_type[i - 1]);
 }
 
 static bool is_bpf_st_mem(struct bpf_insn *insn)
@@ -4391,7 +4407,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 	mark_stack_slot_scratched(env, spi);
 	if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
 	    !register_is_null(reg) && env->bpf_capable) {
-		save_register_state(state, spi, reg, size);
+		save_register_state(env, state, spi, reg, size);
 		/* Break the relation on a narrowing spill. */
 		if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
 			state->stack[spi].spilled_ptr.id = 0;
@@ -4401,7 +4417,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 
 		__mark_reg_known(&fake_reg, insn->imm);
 		fake_reg.type = SCALAR_VALUE;
-		save_register_state(state, spi, &fake_reg, size);
+		save_register_state(env, state, spi, &fake_reg, size);
 		insn_flags = 0; /* not a register spill */
 	} else if (reg && is_spillable_regtype(reg->type)) {
 		/* register containing pointer is being spilled into stack */
@@ -4414,7 +4430,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 			verbose(env, "cannot spill pointers to stack into stack frame of the caller\n");
 			return -EINVAL;
 		}
-		save_register_state(state, spi, reg, size);
+		save_register_state(env, state, spi, reg, size);
 	} else {
 		u8 type = STACK_MISC;
 
@@ -4685,6 +4701,8 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
 						continue;
 					if (type == STACK_MISC)
 						continue;
+					if (type == STACK_ZERO)
+						continue;
 					if (type == STACK_INVALID && env->allow_uninit_stack)
 						continue;
 					verbose(env, "invalid read from stack off %d+%d size %d\n",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 bpf-next 05/10] selftests/bpf: validate STACK_ZERO is preserved on subreg spill
  2023-11-21  0:22 [PATCH v2 bpf-next 00/10] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2023-11-21  0:22 ` [PATCH v2 bpf-next 04/10] bpf: preserve STACK_ZERO slots on partial reg spills Andrii Nakryiko
@ 2023-11-21  0:22 ` Andrii Nakryiko
  2023-11-21  0:55   ` Eduard Zingerman
  2023-11-21  0:22 ` [PATCH v2 bpf-next 06/10] bpf: preserve constant zero when doing partial register restore Andrii Nakryiko
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-11-21  0:22 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Add tests validating that STACK_ZERO slots are preserved when slot is
partially overwritten with subregister spill.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/progs/verifier_spill_fill.c | 36 +++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
index 6115520154e3..899a74ac9093 100644
--- a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
+++ b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
@@ -450,4 +450,40 @@ l0_%=:	r1 >>= 16;					\
 	: __clobber_all);
 }
 
+SEC("raw_tp")
+__log_level(2)
+__success
+__msg("fp-8=0m??mmmm")
+__msg("fp-16=00mm??mm")
+__msg("fp-24=00mm???m")
+__naked void spill_subregs_preserve_stack_zero(void)
+{
+	asm volatile (
+		"call %[bpf_get_prandom_u32];"
+
+		/* 32-bit subreg spill with ZERO, MISC, and INVALID */
+		"*(u8 *)(r10 -1) = 0;"  /* ZERO */
+		"*(u8 *)(r10 -2) = r0;" /* MISC */
+		/* fp-3 and fp-4 stay INVALID */
+		"*(u32 *)(r10 -8) = r0;"
+
+		/* 16-bit subreg spill with ZERO, MISC, and INVALID */
+		"*(u16 *)(r10 -10) = 0;"  /* ZERO */
+		"*(u16 *)(r10 -12) = r0;" /* MISC */
+		/* fp-13 and fp-14 stay INVALID */
+		"*(u16 *)(r10 -16) = r0;"
+
+		/* 8-bit subreg spill with ZERO, MISC, and INVALID */
+		"*(u16 *)(r10 -18) = 0;"  /* ZERO */
+		"*(u16 *)(r10 -20) = r0;" /* MISC */
+		/* fp-21, fp-22, and fp-23 stay INVALID */
+		"*(u8 *)(r10 -24) = r0;"
+
+		"r0 = 0;"
+		"exit;"
+	:
+	: __imm(bpf_get_prandom_u32)
+	: __clobber_all);
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 bpf-next 06/10] bpf: preserve constant zero when doing partial register restore
  2023-11-21  0:22 [PATCH v2 bpf-next 00/10] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2023-11-21  0:22 ` [PATCH v2 bpf-next 05/10] selftests/bpf: validate STACK_ZERO is preserved on subreg spill Andrii Nakryiko
@ 2023-11-21  0:22 ` Andrii Nakryiko
  2023-11-21 16:20   ` Eduard Zingerman
  2023-11-21  0:22 ` [PATCH v2 bpf-next 07/10] selftests/bpf: validate zero preservation for sub-slot loads Andrii Nakryiko
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-11-21  0:22 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Similar to special handling of STACK_ZERO, when reading 1/2/4 bytes from
stack from slot that has register spilled into it and that register has
a constant value zero, preserve that zero and mark spilled register as
precise for that. This makes spilled const zero register and STACK_ZERO
cases equivalent in their behavior.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5145afb5da25..00e6b17b71d0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4695,22 +4695,39 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
 				copy_register_state(&state->regs[dst_regno], reg);
 				state->regs[dst_regno].subreg_def = subreg_def;
 			} else {
+				int spill_cnt = 0, zero_cnt = 0;
+
 				for (i = 0; i < size; i++) {
 					type = stype[(slot - i) % BPF_REG_SIZE];
-					if (type == STACK_SPILL)
+					if (type == STACK_SPILL) {
+						spill_cnt++;
 						continue;
+					}
 					if (type == STACK_MISC)
 						continue;
-					if (type == STACK_ZERO)
+					if (type == STACK_ZERO) {
+						zero_cnt++;
 						continue;
+					}
 					if (type == STACK_INVALID && env->allow_uninit_stack)
 						continue;
 					verbose(env, "invalid read from stack off %d+%d size %d\n",
 						off, i, size);
 					return -EACCES;
 				}
-				mark_reg_unknown(env, state->regs, dst_regno);
-				insn_flags = 0; /* not restoring original register state */
+
+				if (spill_cnt == size &&
+				    tnum_is_const(reg->var_off) && reg->var_off.value == 0) {
+					__mark_reg_const_zero(&state->regs[dst_regno]);
+					/* this IS register fill, so keep insn_flags */
+				} else if (zero_cnt == size) {
+					/* similarly to mark_reg_stack_read(), preserve zeroes */
+					__mark_reg_const_zero(&state->regs[dst_regno]);
+					insn_flags = 0; /* not restoring original register state */
+				} else {
+					mark_reg_unknown(env, state->regs, dst_regno);
+					insn_flags = 0; /* not restoring original register state */
+				}
 			}
 			state->regs[dst_regno].live |= REG_LIVE_WRITTEN;
 		} else if (dst_regno >= 0) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 bpf-next 07/10] selftests/bpf: validate zero preservation for sub-slot loads
  2023-11-21  0:22 [PATCH v2 bpf-next 00/10] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2023-11-21  0:22 ` [PATCH v2 bpf-next 06/10] bpf: preserve constant zero when doing partial register restore Andrii Nakryiko
@ 2023-11-21  0:22 ` Andrii Nakryiko
  2023-11-21 16:20   ` Eduard Zingerman
  2023-11-21  0:22 ` [PATCH v2 bpf-next 08/10] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Andrii Nakryiko
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-11-21  0:22 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Validate that 1-, 2-, and 4-byte loads from stack slots not aligned on
8-byte boundary still preserve zero, when loading from all-STACK_ZERO
sub-slots, or when stack sub-slots are covered by spilled register with
known constant zero value.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/progs/verifier_spill_fill.c | 62 +++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
index 899a74ac9093..60ad2e0247b4 100644
--- a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
+++ b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
@@ -486,4 +486,66 @@ __naked void spill_subregs_preserve_stack_zero(void)
 	: __clobber_all);
 }
 
+char single_byte_buf[1] SEC(".data.single_byte_buf");
+
+SEC("raw_tp")
+__log_level(2)
+__success
+__naked void partial_stack_load_preserves_zeros(void)
+{
+	asm volatile (
+		/* fp-8 is all STACK_ZERO */
+		"*(u64 *)(r10 -8) = 0;"
+
+		/* fp-16 is const zero register */
+		"r0 = 0;"
+		"*(u64 *)(r10 -16) = r0;"
+
+		/* load single U8 from non-aligned STACK_ZERO slot */
+		"r1 = %[single_byte_buf];"
+		"r2 = *(u8 *)(r10 -1);"
+		"r1 += r2;" /* this should be fine */
+
+		/* load single U8 from non-aligned ZERO REG slot */
+		"r1 = %[single_byte_buf];"
+		"r2 = *(u8 *)(r10 -9);"
+		"r1 += r2;" /* this should be fine */
+
+		/* load single U16 from non-aligned STACK_ZERO slot */
+		"r1 = %[single_byte_buf];"
+		"r2 = *(u16 *)(r10 -2);"
+		"r1 += r2;" /* this should be fine */
+
+		/* load single U16 from non-aligned ZERO REG slot */
+		"r1 = %[single_byte_buf];"
+		"r2 = *(u16 *)(r10 -10);"
+		"r1 += r2;" /* this should be fine */
+
+		/* load single U32 from non-aligned STACK_ZERO slot */
+		"r1 = %[single_byte_buf];"
+		"r2 = *(u32 *)(r10 -4);"
+		"r1 += r2;" /* this should be fine */
+
+		/* load single U32 from non-aligned ZERO REG slot */
+		"r1 = %[single_byte_buf];"
+		"r2 = *(u32 *)(r10 -12);"
+		"r1 += r2;" /* this should be fine */
+
+		/* for completeness, load U64 from STACK_ZERO slot */
+		"r1 = %[single_byte_buf];"
+		"r2 = *(u64 *)(r10 -8);"
+		"r1 += r2;" /* this should be fine */
+
+		/* for completeness, load U64 from ZERO REG slot */
+		"r1 = %[single_byte_buf];"
+		"r2 = *(u64 *)(r10 -16);"
+		"r1 += r2;" /* this should be fine */
+
+		"r0 = 0;"
+		"exit;"
+	:
+	: __imm_ptr(single_byte_buf)
+	: __clobber_common);
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 bpf-next 08/10] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
  2023-11-21  0:22 [PATCH v2 bpf-next 00/10] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2023-11-21  0:22 ` [PATCH v2 bpf-next 07/10] selftests/bpf: validate zero preservation for sub-slot loads Andrii Nakryiko
@ 2023-11-21  0:22 ` Andrii Nakryiko
  2023-11-21 20:38   ` Alexei Starovoitov
  2023-11-21  0:22 ` [PATCH v2 bpf-next 09/10] selftests/bpf: validate precision logic in partial_stack_load_preserves_zeros Andrii Nakryiko
  2023-11-21  0:22 ` [PATCH v2 bpf-next 10/10] bpf: use common instruction history across all states Andrii Nakryiko
  9 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-11-21  0:22 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Eduard Zingerman

Now that precision backtracing is supporting register spill/fill to/from
stack, there is another oportunity to be exploited here: minimizing
precise STACK_ZERO cases. With a simple code change we can rely on
initially imprecise register spill tracking for cases when register
spilled to stack was a known zero.

This is a very common case for initializing on the stack variables,
including rather large structures. Often times zero has no special
meaning for the subsequent BPF program logic and is often overwritten
with non-zero values soon afterwards. But due to STACK_ZERO vs
STACK_MISC tracking, such initial zero initialization actually causes
duplication of verifier states as STACK_ZERO is clearly different than
STACK_MISC or spilled SCALAR_VALUE register.

The effect of this (now) trivial change is huge, as can be seen below.
These are differences between BPF selftests, Cilium, and Meta-internal
BPF object files relative to previous patch in this series. You can see
improvements ranging from single-digit percentage improvement for
instructions and states, all the way to 50-60% reduction for some of
Meta-internal host agent programs, and even some Cilium programs.

For Meta-internal ones I left only the differences for largest BPF
object files by states/instructions, as there were too many differences
in the overall output. All the differences were improvements, reducting
number of states and thus instructions validated.

Note, Meta-internal BPF object file names are not printed below.
Many copies of balancer_ingress are actually many different
configurations of Katran, so they are different BPF programs, which
explains state reduction going from -16% all the way to 31%, depending
on BPF program logic complexity.

I also tooked a closer look at a few small-ish BPF programs to validate
the behavior. Let's take bpf_iter_netrlink.bpf.o (first row below).
While it's just 8 vs 5 states, verifier log is still pretty long to
include it here. But the reduction in states is due to the following
piece of C code:

        unsigned long ino;

	...

        sk = s->sk_socket;
        if (!sk) {
                ino = 0;
        } else {
                inode = SOCK_INODE(sk);
                bpf_probe_read_kernel(&ino, sizeof(ino), &inode->i_ino);
        }
        BPF_SEQ_PRINTF(seq, "%-8u %-8lu\n", s->sk_drops.counter, ino);
	return 0;

You can see that in some situations `ino` is zero-initialized, while in
others it's unknown value filled out by bpf_probe_read_kernel(). Before
this change both branches have to be validated twice. Once with
(precise) ino == 0, due to eager STACK_ZERO logic, and then again for
when ino is just STACK_MISC. But BPF_SEQ_PRINTF() doesn't care about
precise value of ino, so with the change in this patch verifier is able
to prune states from one of the branches, reducing number of total
states (and instructions) required for successful validation.

Similar principle applies to bigger real-world applications, just at
a much larger scale.

SELFTESTS
=========
File                                     Program                  Insns (A)  Insns (B)  Insns    (DIFF)  States (A)  States (B)  States (DIFF)
---------------------------------------  -----------------------  ---------  ---------  ---------------  ----------  ----------  -------------
bpf_iter_netlink.bpf.linked3.o           dump_netlink                   148        104    -44 (-29.73%)           8           5   -3 (-37.50%)
bpf_iter_unix.bpf.linked3.o              dump_unix                     8474       8404     -70 (-0.83%)         151         147    -4 (-2.65%)
bpf_loop.bpf.linked3.o                   stack_check                    560        324   -236 (-42.14%)          42          24  -18 (-42.86%)
local_storage_bench.bpf.linked3.o        get_local                      120         77    -43 (-35.83%)           9           6   -3 (-33.33%)
loop6.bpf.linked3.o                      trace_virtqueue_add_sgs      10167       9868    -299 (-2.94%)         226         206   -20 (-8.85%)
pyperf600_bpf_loop.bpf.linked3.o         on_event                      4872       3423  -1449 (-29.74%)         322         229  -93 (-28.88%)
strobemeta.bpf.linked3.o                 on_event                    180697     176036   -4661 (-2.58%)        4780        4734   -46 (-0.96%)
test_cls_redirect.bpf.linked3.o          cls_redirect                 65594      65401    -193 (-0.29%)        4230        4212   -18 (-0.43%)
test_global_func_args.bpf.linked3.o      test_cls                       145        136      -9 (-6.21%)          10           9   -1 (-10.00%)
test_l4lb.bpf.linked3.o                  balancer_ingress              4760       2612  -2148 (-45.13%)         113         102   -11 (-9.73%)
test_l4lb_noinline.bpf.linked3.o         balancer_ingress              4845       4877     +32 (+0.66%)         219         221    +2 (+0.91%)
test_l4lb_noinline_dynptr.bpf.linked3.o  balancer_ingress              2072       2087     +15 (+0.72%)          97          98    +1 (+1.03%)
test_seg6_loop.bpf.linked3.o             __add_egr_x                  12440       9975  -2465 (-19.82%)         364         353   -11 (-3.02%)
test_tcp_hdr_options.bpf.linked3.o       estab                         2558       2572     +14 (+0.55%)         179         180    +1 (+0.56%)
test_xdp_dynptr.bpf.linked3.o            _xdp_tx_iptunnel               645        596     -49 (-7.60%)          26          24    -2 (-7.69%)
test_xdp_noinline.bpf.linked3.o          balancer_ingress_v6           3520       3516      -4 (-0.11%)         216         216    +0 (+0.00%)
xdp_synproxy_kern.bpf.linked3.o          syncookie_tc                 82661      81241   -1420 (-1.72%)        5073        5155   +82 (+1.62%)
xdp_synproxy_kern.bpf.linked3.o          syncookie_xdp                84964      82297   -2667 (-3.14%)        5130        5157   +27 (+0.53%)

META-INTERNAL
=============
Program                                 Insns (A)  Insns (B)  Insns      (DIFF)  States (A)  States (B)  States   (DIFF)
--------------------------------------  ---------  ---------  -----------------  ----------  ----------  ---------------
balancer_ingress                            27925      23608    -4317 (-15.46%)        1488        1482      -6 (-0.40%)
balancer_ingress                            31824      27546    -4278 (-13.44%)        1658        1652      -6 (-0.36%)
balancer_ingress                            32213      27935    -4278 (-13.28%)        1689        1683      -6 (-0.36%)
balancer_ingress                            32213      27935    -4278 (-13.28%)        1689        1683      -6 (-0.36%)
balancer_ingress                            31824      27546    -4278 (-13.44%)        1658        1652      -6 (-0.36%)
balancer_ingress                            38647      29562    -9085 (-23.51%)        2069        1835   -234 (-11.31%)
balancer_ingress                            38647      29562    -9085 (-23.51%)        2069        1835   -234 (-11.31%)
balancer_ingress                            40339      30792    -9547 (-23.67%)        2193        1934   -259 (-11.81%)
balancer_ingress                            37321      29055    -8266 (-22.15%)        1972        1795    -177 (-8.98%)
balancer_ingress                            38176      29753    -8423 (-22.06%)        2008        1831    -177 (-8.81%)
balancer_ingress                            29193      20910    -8283 (-28.37%)        1599        1422   -177 (-11.07%)
balancer_ingress                            30013      21452    -8561 (-28.52%)        1645        1447   -198 (-12.04%)
balancer_ingress                            28691      24290    -4401 (-15.34%)        1545        1531     -14 (-0.91%)
balancer_ingress                            34223      28965    -5258 (-15.36%)        1984        1875    -109 (-5.49%)
balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
balancer_ingress                            35868      26455    -9413 (-26.24%)        2140        1827   -313 (-14.63%)
balancer_ingress                            35868      26455    -9413 (-26.24%)        2140        1827   -313 (-14.63%)
balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
balancer_ingress                            35481      26158    -9323 (-26.28%)        2095        1806   -289 (-13.79%)
balancer_ingress                            34844      29485    -5359 (-15.38%)        2036        1918    -118 (-5.80%)
fbflow_egress                                3256       2652     -604 (-18.55%)         218         192    -26 (-11.93%)
fbflow_ingress                               1026        944       -82 (-7.99%)          70          63     -7 (-10.00%)
sslwall_tc_egress                            8424       7360    -1064 (-12.63%)         498         458     -40 (-8.03%)
syar_accept_protect                         15040       9539    -5501 (-36.58%)         364         220   -144 (-39.56%)
syar_connect_tcp_v6                         15036       9535    -5501 (-36.59%)         360         216   -144 (-40.00%)
syar_connect_udp_v4                         15039       9538    -5501 (-36.58%)         361         217   -144 (-39.89%)
syar_connect_connect4_protect4              24805      15833    -8972 (-36.17%)         756         480   -276 (-36.51%)
syar_lsm_file_open                         167772     151813    -15959 (-9.51%)        1836        1667    -169 (-9.20%)
syar_namespace_create_new                   14805       9304    -5501 (-37.16%)         353         209   -144 (-40.79%)
syar_python3_detect                         17531      12030    -5501 (-31.38%)         391         247   -144 (-36.83%)
syar_ssh_post_fork                          16412      10911    -5501 (-33.52%)         405         261   -144 (-35.56%)
syar_enter_execve                           14728       9227    -5501 (-37.35%)         345         201   -144 (-41.74%)
syar_enter_execveat                         14728       9227    -5501 (-37.35%)         345         201   -144 (-41.74%)
syar_exit_execve                            16622      11121    -5501 (-33.09%)         376         232   -144 (-38.30%)
syar_exit_execveat                          16622      11121    -5501 (-33.09%)         376         232   -144 (-38.30%)
syar_syscalls_kill                          15288       9787    -5501 (-35.98%)         398         254   -144 (-36.18%)
syar_task_enter_pivot_root                  14898       9397    -5501 (-36.92%)         357         213   -144 (-40.34%)
syar_syscalls_setreuid                      16678      11177    -5501 (-32.98%)         429         285   -144 (-33.57%)
syar_syscalls_setuid                        16678      11177    -5501 (-32.98%)         429         285   -144 (-33.57%)
syar_syscalls_process_vm_readv              14959       9458    -5501 (-36.77%)         364         220   -144 (-39.56%)
syar_syscalls_process_vm_writev             15757      10256    -5501 (-34.91%)         390         246   -144 (-36.92%)
do_uprobe                                   15519      10018    -5501 (-35.45%)         373         229   -144 (-38.61%)
edgewall                                   179715      55783  -123932 (-68.96%)       12607        3999  -8608 (-68.28%)
bictcp_state                                 7570       4131    -3439 (-45.43%)         496         269   -227 (-45.77%)
cubictcp_state                               7570       4131    -3439 (-45.43%)         496         269   -227 (-45.77%)
tcp_rate_skb_delivered                        447        272     -175 (-39.15%)          29          18    -11 (-37.93%)
kprobe__bbr_set_state                        4566       2615    -1951 (-42.73%)         209         124    -85 (-40.67%)
kprobe__bictcp_state                         4566       2615    -1951 (-42.73%)         209         124    -85 (-40.67%)
inet_sock_set_state                          1501       1337     -164 (-10.93%)          93          85      -8 (-8.60%)
tcp_retransmit_skb                           1145        981     -164 (-14.32%)          67          59     -8 (-11.94%)
tcp_retransmit_synack                        1183        951     -232 (-19.61%)          67          55    -12 (-17.91%)
bpf_tcptuner                                 1459       1187     -272 (-18.64%)          99          80    -19 (-19.19%)
tw_egress                                     801        776       -25 (-3.12%)          69          66      -3 (-4.35%)
tw_ingress                                    795        770       -25 (-3.14%)          69          66      -3 (-4.35%)
ttls_tc_ingress                             19025      19383      +358 (+1.88%)         470         465      -5 (-1.06%)
ttls_nat_egress                               490        299     -191 (-38.98%)          33          20    -13 (-39.39%)
ttls_nat_ingress                              448        285     -163 (-36.38%)          32          21    -11 (-34.38%)
tw_twfw_egress                             511127     212071  -299056 (-58.51%)       16733        8504  -8229 (-49.18%)
tw_twfw_ingress                            500095     212069  -288026 (-57.59%)       16223        8504  -7719 (-47.58%)
tw_twfw_tc_eg                              511113     212064  -299049 (-58.51%)       16732        8504  -8228 (-49.18%)
tw_twfw_tc_in                              500095     212069  -288026 (-57.59%)       16223        8504  -7719 (-47.58%)
tw_twfw_egress                              12632      12435      -197 (-1.56%)         276         260     -16 (-5.80%)
tw_twfw_ingress                             12631      12454      -177 (-1.40%)         278         261     -17 (-6.12%)
tw_twfw_tc_eg                               12595      12435      -160 (-1.27%)         274         259     -15 (-5.47%)
tw_twfw_tc_in                               12631      12454      -177 (-1.40%)         278         261     -17 (-6.12%)
tw_xdp_dump                                   266        209      -57 (-21.43%)           9           8     -1 (-11.11%)

CILIUM
=========
File           Program                           Insns (A)  Insns (B)  Insns     (DIFF)  States (A)  States (B)  States  (DIFF)
-------------  --------------------------------  ---------  ---------  ----------------  ----------  ----------  --------------
bpf_host.o     cil_to_netdev                          6047       4578   -1469 (-24.29%)         362         249  -113 (-31.22%)
bpf_host.o     handle_lxc_traffic                     2227       1585    -642 (-28.83%)         156         103   -53 (-33.97%)
bpf_host.o     tail_handle_ipv4_from_netdev           2244       1458    -786 (-35.03%)         163         106   -57 (-34.97%)
bpf_host.o     tail_handle_nat_fwd_ipv4              21022      10479  -10543 (-50.15%)        1289         670  -619 (-48.02%)
bpf_host.o     tail_handle_nat_fwd_ipv6              15433      11375   -4058 (-26.29%)         905         643  -262 (-28.95%)
bpf_host.o     tail_ipv4_host_policy_ingress          2219       1367    -852 (-38.40%)         161          96   -65 (-40.37%)
bpf_host.o     tail_nodeport_nat_egress_ipv4         22460      19862   -2598 (-11.57%)        1469        1293  -176 (-11.98%)
bpf_host.o     tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
bpf_host.o     tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
bpf_host.o     tail_nodeport_nat_ipv6_egress          3702       3542     -160 (-4.32%)         215         205    -10 (-4.65%)
bpf_lxc.o      tail_handle_nat_fwd_ipv4              21022      10479  -10543 (-50.15%)        1289         670  -619 (-48.02%)
bpf_lxc.o      tail_handle_nat_fwd_ipv6              15433      11375   -4058 (-26.29%)         905         643  -262 (-28.95%)
bpf_lxc.o      tail_ipv4_ct_egress                    5073       3374   -1699 (-33.49%)         262         172   -90 (-34.35%)
bpf_lxc.o      tail_ipv4_ct_ingress                   5093       3385   -1708 (-33.54%)         262         172   -90 (-34.35%)
bpf_lxc.o      tail_ipv4_ct_ingress_policy_only       5093       3385   -1708 (-33.54%)         262         172   -90 (-34.35%)
bpf_lxc.o      tail_ipv6_ct_egress                    4593       3878    -715 (-15.57%)         194         151   -43 (-22.16%)
bpf_lxc.o      tail_ipv6_ct_ingress                   4606       3891    -715 (-15.52%)         194         151   -43 (-22.16%)
bpf_lxc.o      tail_ipv6_ct_ingress_policy_only       4606       3891    -715 (-15.52%)         194         151   -43 (-22.16%)
bpf_lxc.o      tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
bpf_lxc.o      tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
bpf_overlay.o  tail_handle_nat_fwd_ipv4              20524      10114  -10410 (-50.72%)        1271         638  -633 (-49.80%)
bpf_overlay.o  tail_nodeport_nat_egress_ipv4         22718      19490   -3228 (-14.21%)        1475        1275  -200 (-13.56%)
bpf_overlay.o  tail_nodeport_nat_ingress_ipv4         5526       3534   -1992 (-36.05%)         366         243  -123 (-33.61%)
bpf_overlay.o  tail_nodeport_nat_ingress_ipv6         5132       4256    -876 (-17.07%)         241         219    -22 (-9.13%)
bpf_overlay.o  tail_nodeport_nat_ipv6_egress          3638       3548      -90 (-2.47%)         209         203     -6 (-2.87%)
bpf_overlay.o  tail_rev_nodeport_lb4                  4368       3820    -548 (-12.55%)         248         215   -33 (-13.31%)
bpf_overlay.o  tail_rev_nodeport_lb6                  2867       2428    -439 (-15.31%)         167         140   -27 (-16.17%)
bpf_sock.o     cil_sock6_connect                      1718       1703      -15 (-0.87%)         100          99     -1 (-1.00%)
bpf_xdp.o      tail_handle_nat_fwd_ipv4              12917      12443     -474 (-3.67%)         875         849    -26 (-2.97%)
bpf_xdp.o      tail_handle_nat_fwd_ipv6              13515      13264     -251 (-1.86%)         715         702    -13 (-1.82%)
bpf_xdp.o      tail_lb_ipv4                          39492      36367    -3125 (-7.91%)        2430        2251   -179 (-7.37%)
bpf_xdp.o      tail_lb_ipv6                          80441      78058    -2383 (-2.96%)        3647        3523   -124 (-3.40%)
bpf_xdp.o      tail_nodeport_ipv6_dsr                 1038        901    -137 (-13.20%)          61          55     -6 (-9.84%)
bpf_xdp.o      tail_nodeport_nat_egress_ipv4         13027      12096     -931 (-7.15%)         868         809    -59 (-6.80%)
bpf_xdp.o      tail_nodeport_nat_ingress_ipv4         7617       5900   -1717 (-22.54%)         522         413  -109 (-20.88%)
bpf_xdp.o      tail_nodeport_nat_ingress_ipv6         7575       7395     -180 (-2.38%)         383         374     -9 (-2.35%)
bpf_xdp.o      tail_rev_nodeport_lb4                  6808       6739      -69 (-1.01%)         403         396     -7 (-1.74%)
bpf_xdp.o      tail_rev_nodeport_lb6                 16173      15847     -326 (-2.02%)        1010         990    -20 (-1.98%)

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 00e6b17b71d0..ddb0888e7453 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4405,8 +4405,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 		return err;
 
 	mark_stack_slot_scratched(env, spi);
-	if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
-	    !register_is_null(reg) && env->bpf_capable) {
+	if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && env->bpf_capable) {
 		save_register_state(env, state, spi, reg, size);
 		/* Break the relation on a narrowing spill. */
 		if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
@@ -4455,7 +4454,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 		/* when we zero initialize stack slots mark them as such */
 		if ((reg && register_is_null(reg)) ||
 		    (!reg && is_bpf_st_mem(insn) && insn->imm == 0)) {
-			/* backtracking doesn't work for STACK_ZERO yet. */
+			/* STACK_ZERO case happened because register spill
+			 * wasn't properly aligned at the stack slot boundary,
+			 * so it's not a register spill anymore; force
+			 * originating register to be precise to make
+			 * STACK_ZERO correct for subsequent states
+			 */
 			err = mark_chain_precision(env, value_regno);
 			if (err)
 				return err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 bpf-next 09/10] selftests/bpf: validate precision logic in partial_stack_load_preserves_zeros
  2023-11-21  0:22 [PATCH v2 bpf-next 00/10] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
                   ` (7 preceding siblings ...)
  2023-11-21  0:22 ` [PATCH v2 bpf-next 08/10] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Andrii Nakryiko
@ 2023-11-21  0:22 ` Andrii Nakryiko
  2023-11-21 16:20   ` Eduard Zingerman
  2023-11-21  0:22 ` [PATCH v2 bpf-next 10/10] bpf: use common instruction history across all states Andrii Nakryiko
  9 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-11-21  0:22 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Enhance partial_stack_load_preserves_zeros subtest with detailed
precision propagation log checks. We know expect fp-16 to be spilled,
initially imprecise, zero const register, which is later marked as
precise even when partial stack slot load is performed, even if it's not
a register fill (!).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/progs/verifier_spill_fill.c     | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
index 60ad2e0247b4..e443009bf9b6 100644
--- a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
+++ b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c
@@ -491,6 +491,21 @@ char single_byte_buf[1] SEC(".data.single_byte_buf");
 SEC("raw_tp")
 __log_level(2)
 __success
+/* make sure fp-8 is all STACK_ZERO */
+__msg("2: (7a) *(u64 *)(r10 -8) = 0          ; R10=fp0 fp-8_w=00000000")
+/* but fp-16 is spilled IMPRECISE zero const reg */
+__msg("4: (7b) *(u64 *)(r10 -16) = r0        ; R0_w=0 R10=fp0 fp-16_w=0")
+/* and now check that precision propagation works even for such tricky case */
+__msg("9: (71) r2 = *(u8 *)(r10 -9)          ; R2_w=P0 R10=fp0 fp-16_w=0")
+__msg("10: (0f) r1 += r2")
+__msg("mark_precise: frame0: last_idx 10 first_idx 0 subseq_idx -1")
+__msg("mark_precise: frame0: regs=r2 stack= before 9: (71) r2 = *(u8 *)(r10 -9)")
+__msg("mark_precise: frame0: regs= stack=-16 before 8: (bf) r1 = r6")
+__msg("mark_precise: frame0: regs= stack=-16 before 7: (0f) r1 += r2")
+__msg("mark_precise: frame0: regs= stack=-16 before 6: (71) r2 = *(u8 *)(r10 -1)")
+__msg("mark_precise: frame0: regs= stack=-16 before 5: (bf) r1 = r6")
+__msg("mark_precise: frame0: regs= stack=-16 before 4: (7b) *(u64 *)(r10 -16) = r0")
+__msg("mark_precise: frame0: regs=r0 stack= before 3: (b7) r0 = 0")
 __naked void partial_stack_load_preserves_zeros(void)
 {
 	asm volatile (
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 bpf-next 10/10] bpf: use common instruction history across all states
  2023-11-21  0:22 [PATCH v2 bpf-next 00/10] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
                   ` (8 preceding siblings ...)
  2023-11-21  0:22 ` [PATCH v2 bpf-next 09/10] selftests/bpf: validate precision logic in partial_stack_load_preserves_zeros Andrii Nakryiko
@ 2023-11-21  0:22 ` Andrii Nakryiko
  2023-11-21 16:20   ` Eduard Zingerman
  9 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-11-21  0:22 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Instead of allocating and copying instruction history each time we
enqueue child verifier state, switch to a model where we use one common
dynamically sized array of instruction history entries across all states.

The key observation for proving this is correct is that instruction
history is only relevant while state is active, which means it either is
a current state (and thus we are actively modifying instruction history
and no other state can interfere with us) or we are checkpointed state
with some children still active (either enqueued or being current).

In the latter case our portion of instruction history is finalized and
won't change or grow, so as long as we keep it immutable until the state
is finalized, we are good.

Now, when state is finalized and is put into state hash for potentially
future pruning lookups, instruction history is not used anymore. This is
because instruction history is only used by precision marking logic, and
we never modify precision markings for finalized states.

So, instead of each state having its own small instruction history, we
keep a global dynamically-sized instruction history, where each state in
current DFS path from root to active state remembers its portion of
instruction history.  Current state can append to this history, but
cannot modify any of its parent histories.

Because the insn_hist array can be grown through realloc, states don't
keep pointers, they instead maintain two indices, [start, end), into
global instruction history array. End is exclusive index, so
`start == end` means there is no relevant instruction history.

This eliminates a lot of allocations and minimizes overall memory usage.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf_verifier.h | 19 ++++----
 kernel/bpf/verifier.c        | 92 ++++++++++++++++--------------------
 2 files changed, 53 insertions(+), 58 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 1901056ee61c..cd139a133f6e 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -311,7 +311,7 @@ struct bpf_func_state {
 
 #define MAX_CALL_FRAMES 8
 
-/* instruction history flags, used in bpf_jmp_history_entry.flags field */
+/* instruction history flags, used in bpf_insn_hist_entry.flags field */
 enum {
 	/* instruction references stack slot through PTR_TO_STACK register;
 	 * we also store stack's frame number in lower 3 bits (MAX_CALL_FRAMES is 8)
@@ -329,7 +329,7 @@ enum {
 static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES);
 static_assert(INSN_F_SPI_MASK + 1 >= MAX_BPF_STACK / 8);
 
-struct bpf_jmp_history_entry {
+struct bpf_insn_hist_entry {
 	u32 idx;
 	/* insn idx can't be bigger than 1 million */
 	u32 prev_idx : 22;
@@ -414,13 +414,14 @@ struct bpf_verifier_state {
 	 * See get_loop_entry() for more information.
 	 */
 	struct bpf_verifier_state *loop_entry;
-	/* jmp history recorded from first to last.
-	 * backtracking is using it to go from last to first.
-	 * For most states jmp_history_cnt is [0-3].
+	/* Sub-range of env->insn_hist[] corresponding to this state's
+	 * instruction history.
+	 * Backtracking is using it to go from last to first.
+	 * For most states instruction history is short, 0-3 instructions.
 	 * For loops can go up to ~40.
 	 */
-	struct bpf_jmp_history_entry *jmp_history;
-	u32 jmp_history_cnt;
+	u32 insn_hist_start;
+	u32 insn_hist_end;
 	u32 dfs_depth;
 };
 
@@ -657,7 +658,9 @@ struct bpf_verifier_env {
 		int cur_stack;
 	} cfg;
 	struct backtrack_state bt;
-	struct bpf_jmp_history_entry *cur_hist_ent;
+	struct bpf_insn_hist_entry *insn_hist;
+	struct bpf_insn_hist_entry *cur_hist_ent;
+	u32 insn_hist_cap;
 	u32 pass_cnt; /* number of times do_check() was called */
 	u32 subprog_cnt;
 	/* number of instructions analyzed by the verifier */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ddb0888e7453..5f4c238d62db 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1292,13 +1292,6 @@ static void free_func_state(struct bpf_func_state *state)
 	kfree(state);
 }
 
-static void clear_jmp_history(struct bpf_verifier_state *state)
-{
-	kfree(state->jmp_history);
-	state->jmp_history = NULL;
-	state->jmp_history_cnt = 0;
-}
-
 static void free_verifier_state(struct bpf_verifier_state *state,
 				bool free_self)
 {
@@ -1308,7 +1301,6 @@ static void free_verifier_state(struct bpf_verifier_state *state,
 		free_func_state(state->frame[i]);
 		state->frame[i] = NULL;
 	}
-	clear_jmp_history(state);
 	if (free_self)
 		kfree(state);
 }
@@ -1334,13 +1326,6 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	struct bpf_func_state *dst;
 	int i, err;
 
-	dst_state->jmp_history = copy_array(dst_state->jmp_history, src->jmp_history,
-					  src->jmp_history_cnt, sizeof(*dst_state->jmp_history),
-					  GFP_USER);
-	if (!dst_state->jmp_history)
-		return -ENOMEM;
-	dst_state->jmp_history_cnt = src->jmp_history_cnt;
-
 	/* if dst has more stack frames then src frame, free them, this is also
 	 * necessary in case of exceptional exits using bpf_throw.
 	 */
@@ -1357,6 +1342,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	dst_state->parent = src->parent;
 	dst_state->first_insn_idx = src->first_insn_idx;
 	dst_state->last_insn_idx = src->last_insn_idx;
+	dst_state->insn_hist_start = src->insn_hist_start;
+	dst_state->insn_hist_end = src->insn_hist_end;
 	dst_state->dfs_depth = src->dfs_depth;
 	dst_state->used_as_loop_entry = src->used_as_loop_entry;
 	for (i = 0; i <= src->curframe; i++) {
@@ -3214,11 +3201,10 @@ static bool is_jmp_point(struct bpf_verifier_env *env, int insn_idx)
 }
 
 /* for any branch, call, exit record the history of jmps in the given state */
-static int push_jmp_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
-			    int insn_flags)
+static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
+			     int insn_flags)
 {
-	u32 cnt = cur->jmp_history_cnt;
-	struct bpf_jmp_history_entry *p;
+	struct bpf_insn_hist_entry *p;
 	size_t alloc_size;
 
 	/* combine instruction flags if we already recorded this instruction */
@@ -3234,46 +3220,51 @@ static int push_jmp_history(struct bpf_verifier_env *env, struct bpf_verifier_st
 		return 0;
 	}
 
-	cnt++;
-	alloc_size = kmalloc_size_roundup(size_mul(cnt, sizeof(*p)));
-	p = krealloc(cur->jmp_history, alloc_size, GFP_USER);
-	if (!p)
-		return -ENOMEM;
-	cur->jmp_history = p;
+	if (cur->insn_hist_end + 1 > env->insn_hist_cap) {
+		alloc_size = size_mul(cur->insn_hist_end + 1, sizeof(*p));
+		alloc_size = kmalloc_size_roundup(alloc_size);
+		p = krealloc(env->insn_hist, alloc_size, GFP_USER);
+		if (!p)
+			return -ENOMEM;
+		env->insn_hist = p;
+		env->insn_hist_cap = alloc_size / sizeof(*p);
+	}
 
-	p = &cur->jmp_history[cnt - 1];
+	p = &env->insn_hist[cur->insn_hist_end];
 	p->idx = env->insn_idx;
 	p->prev_idx = env->prev_insn_idx;
 	p->flags = insn_flags;
-	cur->jmp_history_cnt = cnt;
+	cur->insn_hist_end++;
 	env->cur_hist_ent = p;
 
 	return 0;
 }
 
-static struct bpf_jmp_history_entry *get_jmp_hist_entry(struct bpf_verifier_state *st,
-						        u32 hist_end, int insn_idx)
+static struct bpf_insn_hist_entry *get_insn_hist_entry(struct bpf_verifier_env *env,
+						       u32 hist_end, int insn_idx)
 {
-	if (hist_end > 0 && st->jmp_history[hist_end - 1].idx == insn_idx)
-		return &st->jmp_history[hist_end - 1];
+	if (hist_end > 0 && env->insn_hist[hist_end - 1].idx == insn_idx)
+		return &env->insn_hist[hist_end - 1];
 	return NULL;
 }
 
 /* Backtrack one insn at a time. If idx is not at the top of recorded
  * history then previous instruction came from straight line execution.
  */
-static int get_prev_insn_idx(struct bpf_verifier_state *st, int i,
-			     u32 *history)
+static int get_prev_insn_idx(const struct bpf_verifier_env *env,
+			     struct bpf_verifier_state *st,
+			     int insn_idx, u32 *hist_endp)
 {
-	u32 cnt = *history;
+	u32 hist_end = *hist_endp;
+	u32 cnt = hist_end - st->insn_hist_start;
 
-	if (cnt && st->jmp_history[cnt - 1].idx == i) {
-		i = st->jmp_history[cnt - 1].prev_idx;
-		(*history)--;
+	if (cnt && env->insn_hist[hist_end - 1].idx == insn_idx) {
+		insn_idx = env->insn_hist[hist_end - 1].prev_idx;
+		(*hist_endp)--;
 	} else {
-		i--;
+		insn_idx--;
 	}
-	return i;
+	return insn_idx;
 }
 
 static const char *disasm_kfunc_name(void *data, const struct bpf_insn *insn)
@@ -3462,7 +3453,7 @@ static void fmt_stack_mask(char *buf, ssize_t buf_sz, u64 stack_mask)
  *   - *was* processed previously during backtracking.
  */
 static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
-			  struct bpf_jmp_history_entry *hist, struct backtrack_state *bt)
+			  struct bpf_insn_hist_entry *hist, struct backtrack_state *bt)
 {
 	const struct bpf_insn_cbs cbs = {
 		.cb_call	= disasm_kfunc_name,
@@ -3953,7 +3944,7 @@ static int mark_precise_scalar_ids(struct bpf_verifier_env *env, struct bpf_veri
  * SCALARS, as well as any other registers and slots that contribute to
  * a tracked state of given registers/stack slots, depending on specific BPF
  * assembly instructions (see backtrack_insns() for exact instruction handling
- * logic). This backtracking relies on recorded jmp_history and is able to
+ * logic). This backtracking relies on recorded insn_hist and is able to
  * traverse entire chain of parent states. This process ends only when all the
  * necessary registers/slots and their transitive dependencies are marked as
  * precise.
@@ -4070,8 +4061,8 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 
 	for (;;) {
 		DECLARE_BITMAP(mask, 64);
-		u32 history = st->jmp_history_cnt;
-		struct bpf_jmp_history_entry *hist;
+		u32 hist_end = st->insn_hist_end;
+		struct bpf_insn_hist_entry *hist;
 
 		if (env->log.level & BPF_LOG_LEVEL2) {
 			verbose(env, "mark_precise: frame%d: last_idx %d first_idx %d subseq_idx %d \n",
@@ -4135,7 +4126,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 				err = 0;
 				skip_first = false;
 			} else {
-				hist = get_jmp_hist_entry(st, history, i);
+				hist = get_insn_hist_entry(env, hist_end, i);
 				err = backtrack_insn(env, i, subseq_idx, hist, bt);
 			}
 			if (err == -ENOTSUPP) {
@@ -4154,7 +4145,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
 			if (i == first_idx)
 				break;
 			subseq_idx = i;
-			i = get_prev_insn_idx(st, i, &history);
+			i = get_prev_insn_idx(env, st, i, &hist_end);
 			if (i >= env->prog->len) {
 				/* This can happen if backtracking reached insn 0
 				 * and there are still reg_mask or stack_mask
@@ -4473,7 +4464,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 	}
 
 	if (insn_flags)
-		return push_jmp_history(env, env->cur_state, insn_flags);
+		return push_insn_history(env, env->cur_state, insn_flags);
 	return 0;
 }
 
@@ -4773,7 +4764,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
 		insn_flags = 0; /* we are not restoring spilled register */
 	}
 	if (insn_flags)
-		return push_jmp_history(env, env->cur_state, insn_flags);
+		return push_insn_history(env, env->cur_state, insn_flags);
 	return 0;
 }
 
@@ -16767,7 +16758,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 			 * the current state.
 			 */
 			if (is_jmp_point(env, env->insn_idx))
-				err = err ? : push_jmp_history(env, cur, 0);
+				err = err ? : push_insn_history(env, cur, 0);
 			err = err ? : propagate_precision(env, &sl->state);
 			if (err)
 				return err;
@@ -16866,8 +16857,8 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 
 	cur->parent = new;
 	cur->first_insn_idx = insn_idx;
+	cur->insn_hist_start = cur->insn_hist_end;
 	cur->dfs_depth = new->dfs_depth + 1;
-	clear_jmp_history(cur);
 	new_sl->next = *explored_state(env, insn_idx);
 	*explored_state(env, insn_idx) = new_sl;
 	/* connect new state to parentage chain. Current frame needs all
@@ -17034,7 +17025,7 @@ static int do_check(struct bpf_verifier_env *env)
 		}
 
 		if (is_jmp_point(env, env->insn_idx)) {
-			err = push_jmp_history(env, state, 0);
+			err = push_insn_history(env, state, 0);
 			if (err)
 				return err;
 		}
@@ -20566,6 +20557,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	if (!is_priv)
 		mutex_unlock(&bpf_verifier_lock);
 	vfree(env->insn_aux_data);
+	kvfree(env->insn_hist);
 err_free_env:
 	kfree(env);
 	return ret;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 bpf-next 02/10] selftests/bpf: add stack access precision test
  2023-11-21  0:22 ` [PATCH v2 bpf-next 02/10] selftests/bpf: add stack access precision test Andrii Nakryiko
@ 2023-11-21  0:42   ` Eduard Zingerman
  0 siblings, 0 replies; 22+ messages in thread
From: Eduard Zingerman @ 2023-11-21  0:42 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team

On Mon, 2023-11-20 at 16:22 -0800, Andrii Nakryiko wrote:
> Add a new selftests that validates precision tracking for stack access
> instruction, using both r10-based and non-r10-based accesses. For
> non-r10 ones we also make sure to have non-zero var_off to validate that
> final stack offset is tracked properly in instruction history
> information inside verifier.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 bpf-next 05/10] selftests/bpf: validate STACK_ZERO is preserved on subreg spill
  2023-11-21  0:22 ` [PATCH v2 bpf-next 05/10] selftests/bpf: validate STACK_ZERO is preserved on subreg spill Andrii Nakryiko
@ 2023-11-21  0:55   ` Eduard Zingerman
  0 siblings, 0 replies; 22+ messages in thread
From: Eduard Zingerman @ 2023-11-21  0:55 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team

On Mon, 2023-11-20 at 16:22 -0800, Andrii Nakryiko wrote:
> Add tests validating that STACK_ZERO slots are preserved when slot is
> partially overwritten with subregister spill.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 bpf-next 06/10] bpf: preserve constant zero when doing partial register restore
  2023-11-21  0:22 ` [PATCH v2 bpf-next 06/10] bpf: preserve constant zero when doing partial register restore Andrii Nakryiko
@ 2023-11-21 16:20   ` Eduard Zingerman
  2023-11-21 18:14     ` Andrii Nakryiko
  0 siblings, 1 reply; 22+ messages in thread
From: Eduard Zingerman @ 2023-11-21 16:20 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team

On Mon, 2023-11-20 at 16:22 -0800, Andrii Nakryiko wrote:
> Similar to special handling of STACK_ZERO, when reading 1/2/4 bytes from
> stack from slot that has register spilled into it and that register has
> a constant value zero, preserve that zero and mark spilled register as
> precise for that. This makes spilled const zero register and STACK_ZERO
> cases equivalent in their behavior.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> +				if (spill_cnt == size &&
> +				    tnum_is_const(reg->var_off) && reg->var_off.value == 0) {
> +					__mark_reg_const_zero(&state->regs[dst_regno]);
> +					/* this IS register fill, so keep insn_flags */
> +				} else if (zero_cnt == size) {
> +					/* similarly to mark_reg_stack_read(), preserve zeroes */
> +					__mark_reg_const_zero(&state->regs[dst_regno]);
> +					insn_flags = 0; /* not restoring original register state */

nit: In case if there would be v3, could you please
     leave a comment here, something like below:
     
       when check_stack_write_fixed_off() puts STACK_ZERO marks
       for writes, not aligned on register boundary, it marks source
       registers precise. Thus, additional precision propagation is
       necessary in this case and insn_flags could be cleared.

     or something along these lines?

> +				} else {
> +					mark_reg_unknown(env, state->regs, dst_regno);
> +					insn_flags = 0; /* not restoring original register state */
> +				}
>  			}
>  			state->regs[dst_regno].live |= REG_LIVE_WRITTEN;
>  		} else if (dst_regno >= 0) {




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 bpf-next 07/10] selftests/bpf: validate zero preservation for sub-slot loads
  2023-11-21  0:22 ` [PATCH v2 bpf-next 07/10] selftests/bpf: validate zero preservation for sub-slot loads Andrii Nakryiko
@ 2023-11-21 16:20   ` Eduard Zingerman
  2023-11-21 18:15     ` Andrii Nakryiko
  0 siblings, 1 reply; 22+ messages in thread
From: Eduard Zingerman @ 2023-11-21 16:20 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team

On Mon, 2023-11-20 at 16:22 -0800, Andrii Nakryiko wrote:
[...]

> +SEC("raw_tp")
> +__log_level(2)
> +__success
> +__naked void partial_stack_load_preserves_zeros(void)
> +{
> +	asm volatile (
> +		/* fp-8 is all STACK_ZERO */
> +		"*(u64 *)(r10 -8) = 0;"

This fails when compiled with llvm-16, bpf st assembly support is only
present in llvm-18. If we want to preserve support for llvm-16 this
test would require ifdefs or the following patch:

@@ -3,6 +3,7 @@
 
 #include <linux/bpf.h>
 #include <bpf/bpf_helpers.h>
+#include "../../../include/linux/filter.h"
 #include "bpf_misc.h"
 
 struct {
@@ -510,7 +511,7 @@ __naked void partial_stack_load_preserves_zeros(void)
 {
        asm volatile (
                /* fp-8 is all STACK_ZERO */
-               "*(u64 *)(r10 -8) = 0;"
+               ".8byte %[fp8_st_zero];"
 
                /* fp-16 is const zero register */
                "r0 = 0;"
@@ -559,7 +560,8 @@ __naked void partial_stack_load_preserves_zeros(void)
                "r0 = 0;"
                "exit;"
        :
-       : __imm_ptr(single_byte_buf)
+       : __imm_ptr(single_byte_buf),
+         __imm_insn(fp8_st_zero, BPF_ST_MEM(BPF_DW, BPF_REG_FP, -8, 0))
        : __clobber_common);
 }


> +		/* fp-16 is const zero register */
> +		"r0 = 0;"
> +		"*(u64 *)(r10 -16) = r0;"
> +
> +		/* load single U8 from non-aligned STACK_ZERO slot */
> +		"r1 = %[single_byte_buf];"
> +		"r2 = *(u8 *)(r10 -1);"
> +		"r1 += r2;" /* this should be fine */

Question: the comment suggests that adding something other than
          zero would be an error, however error would only be
          reported if *r1 is attempted, maybe add such access?
          E.g. "*(u8 *)(r1 + 0) = r2;"?

[...]




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 bpf-next 09/10] selftests/bpf: validate precision logic in partial_stack_load_preserves_zeros
  2023-11-21  0:22 ` [PATCH v2 bpf-next 09/10] selftests/bpf: validate precision logic in partial_stack_load_preserves_zeros Andrii Nakryiko
@ 2023-11-21 16:20   ` Eduard Zingerman
  0 siblings, 0 replies; 22+ messages in thread
From: Eduard Zingerman @ 2023-11-21 16:20 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team

On Mon, 2023-11-20 at 16:22 -0800, Andrii Nakryiko wrote:
> Enhance partial_stack_load_preserves_zeros subtest with detailed
> precision propagation log checks. We know expect fp-16 to be spilled,
> initially imprecise, zero const register, which is later marked as
> precise even when partial stack slot load is performed, even if it's not
> a register fill (!).
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 bpf-next 10/10] bpf: use common instruction history across all states
  2023-11-21  0:22 ` [PATCH v2 bpf-next 10/10] bpf: use common instruction history across all states Andrii Nakryiko
@ 2023-11-21 16:20   ` Eduard Zingerman
  0 siblings, 0 replies; 22+ messages in thread
From: Eduard Zingerman @ 2023-11-21 16:20 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team

On Mon, 2023-11-20 at 16:22 -0800, Andrii Nakryiko wrote:
> Instead of allocating and copying instruction history each time we
> enqueue child verifier state, switch to a model where we use one common
> dynamically sized array of instruction history entries across all states.
> 
> The key observation for proving this is correct is that instruction
> history is only relevant while state is active, which means it either is
> a current state (and thus we are actively modifying instruction history
> and no other state can interfere with us) or we are checkpointed state
> with some children still active (either enqueued or being current).
> 
> In the latter case our portion of instruction history is finalized and
> won't change or grow, so as long as we keep it immutable until the state
> is finalized, we are good.
> 
> Now, when state is finalized and is put into state hash for potentially
> future pruning lookups, instruction history is not used anymore. This is
> because instruction history is only used by precision marking logic, and
> we never modify precision markings for finalized states.
> 
> So, instead of each state having its own small instruction history, we
> keep a global dynamically-sized instruction history, where each state in
> current DFS path from root to active state remembers its portion of
> instruction history.  Current state can append to this history, but
> cannot modify any of its parent histories.
> 
> Because the insn_hist array can be grown through realloc, states don't
> keep pointers, they instead maintain two indices, [start, end), into
> global instruction history array. End is exclusive index, so
> `start == end` means there is no relevant instruction history.
> 
> This eliminates a lot of allocations and minimizes overall memory usage.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 bpf-next 06/10] bpf: preserve constant zero when doing partial register restore
  2023-11-21 16:20   ` Eduard Zingerman
@ 2023-11-21 18:14     ` Andrii Nakryiko
  2023-11-21 20:22       ` Eduard Zingerman
  0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-11-21 18:14 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Tue, Nov 21, 2023 at 8:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-11-20 at 16:22 -0800, Andrii Nakryiko wrote:
> > Similar to special handling of STACK_ZERO, when reading 1/2/4 bytes from
> > stack from slot that has register spilled into it and that register has
> > a constant value zero, preserve that zero and mark spilled register as
> > precise for that. This makes spilled const zero register and STACK_ZERO
> > cases equivalent in their behavior.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
> > +                             if (spill_cnt == size &&
> > +                                 tnum_is_const(reg->var_off) && reg->var_off.value == 0) {
> > +                                     __mark_reg_const_zero(&state->regs[dst_regno]);
> > +                                     /* this IS register fill, so keep insn_flags */
> > +                             } else if (zero_cnt == size) {
> > +                                     /* similarly to mark_reg_stack_read(), preserve zeroes */
> > +                                     __mark_reg_const_zero(&state->regs[dst_regno]);
> > +                                     insn_flags = 0; /* not restoring original register state */
>
> nit: In case if there would be v3, could you please
>      leave a comment here, something like below:
>
>        when check_stack_write_fixed_off() puts STACK_ZERO marks
>        for writes, not aligned on register boundary, it marks source
>        registers precise. Thus, additional precision propagation is
>        necessary in this case and insn_flags could be cleared.
>
>      or something along these lines?
>

Hm... this seems misleading, precision propagation of original
register on write is orthogonal to this. The real reason why we clear
insn_flags is because there is no *register* fill here. There might
not be any spilled register at all here, or a completely irrelevant
spilled register. This instruction is basically `r1 = 0;`, though
obfuscated as stack dereference. So from the backtracking side of
things, there is no stack access here.



> > +                             } else {
> > +                                     mark_reg_unknown(env, state->regs, dst_regno);
> > +                                     insn_flags = 0; /* not restoring original register state */
> > +                             }
> >                       }
> >                       state->regs[dst_regno].live |= REG_LIVE_WRITTEN;
> >               } else if (dst_regno >= 0) {
>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 bpf-next 07/10] selftests/bpf: validate zero preservation for sub-slot loads
  2023-11-21 16:20   ` Eduard Zingerman
@ 2023-11-21 18:15     ` Andrii Nakryiko
  0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2023-11-21 18:15 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Tue, Nov 21, 2023 at 8:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-11-20 at 16:22 -0800, Andrii Nakryiko wrote:
> [...]
>
> > +SEC("raw_tp")
> > +__log_level(2)
> > +__success
> > +__naked void partial_stack_load_preserves_zeros(void)
> > +{
> > +     asm volatile (
> > +             /* fp-8 is all STACK_ZERO */
> > +             "*(u64 *)(r10 -8) = 0;"
>
> This fails when compiled with llvm-16, bpf st assembly support is only
> present in llvm-18. If we want to preserve support for llvm-16 this
> test would require ifdefs or the following patch:

Ok, I'll use that, thanks! I need BPF_ST here to avoid register spill code path.


>
> @@ -3,6 +3,7 @@
>
>  #include <linux/bpf.h>
>  #include <bpf/bpf_helpers.h>
> +#include "../../../include/linux/filter.h"
>  #include "bpf_misc.h"
>
>  struct {
> @@ -510,7 +511,7 @@ __naked void partial_stack_load_preserves_zeros(void)
>  {
>         asm volatile (
>                 /* fp-8 is all STACK_ZERO */
> -               "*(u64 *)(r10 -8) = 0;"
> +               ".8byte %[fp8_st_zero];"
>
>                 /* fp-16 is const zero register */
>                 "r0 = 0;"
> @@ -559,7 +560,8 @@ __naked void partial_stack_load_preserves_zeros(void)
>                 "r0 = 0;"
>                 "exit;"
>         :
> -       : __imm_ptr(single_byte_buf)
> +       : __imm_ptr(single_byte_buf),
> +         __imm_insn(fp8_st_zero, BPF_ST_MEM(BPF_DW, BPF_REG_FP, -8, 0))
>         : __clobber_common);
>  }
>
>
> > +             /* fp-16 is const zero register */
> > +             "r0 = 0;"
> > +             "*(u64 *)(r10 -16) = r0;"
> > +
> > +             /* load single U8 from non-aligned STACK_ZERO slot */
> > +             "r1 = %[single_byte_buf];"
> > +             "r2 = *(u8 *)(r10 -1);"
> > +             "r1 += r2;" /* this should be fine */
>
> Question: the comment suggests that adding something other than
>           zero would be an error, however error would only be
>           reported if *r1 is attempted, maybe add such access?
>           E.g. "*(u8 *)(r1 + 0) = r2;"?

you are right! I assumed we check bounds during pointer increment, but
I'm wrong. I added dereference instruction everywhere, thanks!

>
> [...]
>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 bpf-next 06/10] bpf: preserve constant zero when doing partial register restore
  2023-11-21 18:14     ` Andrii Nakryiko
@ 2023-11-21 20:22       ` Eduard Zingerman
  0 siblings, 0 replies; 22+ messages in thread
From: Eduard Zingerman @ 2023-11-21 20:22 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team

On Tue, 2023-11-21 at 10:14 -0800, Andrii Nakryiko wrote:
> On Tue, Nov 21, 2023 at 8:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Mon, 2023-11-20 at 16:22 -0800, Andrii Nakryiko wrote:
> > > Similar to special handling of STACK_ZERO, when reading 1/2/4 bytes from
> > > stack from slot that has register spilled into it and that register has
> > > a constant value zero, preserve that zero and mark spilled register as
> > > precise for that. This makes spilled const zero register and STACK_ZERO
> > > cases equivalent in their behavior.
> > > 
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > 
> > Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> > 
> > [...]
> > 
> > > +                             if (spill_cnt == size &&
> > > +                                 tnum_is_const(reg->var_off) && reg->var_off.value == 0) {
> > > +                                     __mark_reg_const_zero(&state->regs[dst_regno]);
> > > +                                     /* this IS register fill, so keep insn_flags */
> > > +                             } else if (zero_cnt == size) {
> > > +                                     /* similarly to mark_reg_stack_read(), preserve zeroes */
> > > +                                     __mark_reg_const_zero(&state->regs[dst_regno]);
> > > +                                     insn_flags = 0; /* not restoring original register state */
> > 
> > nit: In case if there would be v3, could you please
> >      leave a comment here, something like below:
> > 
> >        when check_stack_write_fixed_off() puts STACK_ZERO marks
> >        for writes, not aligned on register boundary, it marks source
> >        registers precise. Thus, additional precision propagation is
> >        necessary in this case and insn_flags could be cleared.
> > 
> >      or something along these lines?
> > 
> 
> Hm... this seems misleading, precision propagation of original
> register on write is orthogonal to this. The real reason why we clear
> insn_flags is because there is no *register* fill here. There might
> not be any spilled register at all here, or a completely irrelevant
> spilled register. This instruction is basically `r1 = 0;`, though
> obfuscated as stack dereference. So from the backtracking side of
> things, there is no stack access here.

Hm, I mostly agree. This comment would be relevant only before patch #8
in this series.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 bpf-next 08/10] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
  2023-11-21  0:22 ` [PATCH v2 bpf-next 08/10] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Andrii Nakryiko
@ 2023-11-21 20:38   ` Alexei Starovoitov
  2023-11-21 22:01     ` Andrii Nakryiko
  0 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2023-11-21 20:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, martin.lau, kernel-team, Eduard Zingerman

On Mon, Nov 20, 2023 at 04:22:19PM -0800, Andrii Nakryiko wrote:
> include it here. But the reduction in states is due to the following
> piece of C code:
> 
>         unsigned long ino;
> 
> 	...
> 
>         sk = s->sk_socket;
>         if (!sk) {
>                 ino = 0;
>         } else {
>                 inode = SOCK_INODE(sk);
>                 bpf_probe_read_kernel(&ino, sizeof(ino), &inode->i_ino);
>         }
>         BPF_SEQ_PRINTF(seq, "%-8u %-8lu\n", s->sk_drops.counter, ino);
> 	return 0;
> 
> You can see that in some situations `ino` is zero-initialized, while in
> others it's unknown value filled out by bpf_probe_read_kernel(). Before
> this change both branches have to be validated twice. Once with

I think you wanted to say that the code _after_ both branches converge
had to be validated twice.
With or without this patch both branches (ino = 0 and probe_read)
will be validated only once. It's the code that after the branch
that gets state pruned after this patch.

> (precise) ino == 0, due to eager STACK_ZERO logic, and then again for
> when ino is just STACK_MISC. But BPF_SEQ_PRINTF() doesn't care about
> precise value of ino, so with the change in this patch verifier is able
> to prune states from one of the branches, reducing number of total
> states (and instructions) required for successful validation.

This part is good.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 bpf-next 08/10] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
  2023-11-21 20:38   ` Alexei Starovoitov
@ 2023-11-21 22:01     ` Andrii Nakryiko
  0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2023-11-21 22:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team,
	Eduard Zingerman

On Tue, Nov 21, 2023 at 12:38 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Nov 20, 2023 at 04:22:19PM -0800, Andrii Nakryiko wrote:
> > include it here. But the reduction in states is due to the following
> > piece of C code:
> >
> >         unsigned long ino;
> >
> >       ...
> >
> >         sk = s->sk_socket;
> >         if (!sk) {
> >                 ino = 0;
> >         } else {
> >                 inode = SOCK_INODE(sk);
> >                 bpf_probe_read_kernel(&ino, sizeof(ino), &inode->i_ino);
> >         }
> >         BPF_SEQ_PRINTF(seq, "%-8u %-8lu\n", s->sk_drops.counter, ino);
> >       return 0;
> >
> > You can see that in some situations `ino` is zero-initialized, while in
> > others it's unknown value filled out by bpf_probe_read_kernel(). Before
> > this change both branches have to be validated twice. Once with
>
> I think you wanted to say that the code _after_ both branches converge
> had to be validated twice.
> With or without this patch both branches (ino = 0 and probe_read)
> will be validated only once. It's the code that after the branch
> that gets state pruned after this patch.

Yes, correct, it's the common code after two branches that now will
converge, so BPF_SEQ_PRINTF() invocation in this case, I'll improve
the wording.

In practice a slightly modified variant also happens: we
zero-initialize something, then depending on some conditions (error
checking, etc), we overwrite zero-initialized stack slots with some
unknown values that are not precise. Before this change we'll have
STACK_ZERO and STACK_MISC in different branches/code paths, which
would effectively duplicate all subsequent states that needs to be
verified. Now there is a high chance for this STACK_ZERO vs STACK_MISC
to not matter at all. We also can have spilled imprecise SCALAR_VALUE
register instead of STACK_MISC. The principal idea is that STACK_ZERO
promises a lot (that it is precisely zero), while often time those
values are just some integers with values we don't care about.

>
> > (precise) ino == 0, due to eager STACK_ZERO logic, and then again for
> > when ino is just STACK_MISC. But BPF_SEQ_PRINTF() doesn't care about
> > precise value of ino, so with the change in this patch verifier is able
> > to prune states from one of the branches, reducing number of total
> > states (and instructions) required for successful validation.
>
> This part is good.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-11-21 22:01 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-21  0:22 [PATCH v2 bpf-next 00/10] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
2023-11-21  0:22 ` [PATCH v2 bpf-next 01/10] bpf: support non-r10 register spill/fill to/from stack in precision tracking Andrii Nakryiko
2023-11-21  0:22 ` [PATCH v2 bpf-next 02/10] selftests/bpf: add stack access precision test Andrii Nakryiko
2023-11-21  0:42   ` Eduard Zingerman
2023-11-21  0:22 ` [PATCH v2 bpf-next 03/10] bpf: fix check for attempt to corrupt spilled pointer Andrii Nakryiko
2023-11-21  0:22 ` [PATCH v2 bpf-next 04/10] bpf: preserve STACK_ZERO slots on partial reg spills Andrii Nakryiko
2023-11-21  0:22 ` [PATCH v2 bpf-next 05/10] selftests/bpf: validate STACK_ZERO is preserved on subreg spill Andrii Nakryiko
2023-11-21  0:55   ` Eduard Zingerman
2023-11-21  0:22 ` [PATCH v2 bpf-next 06/10] bpf: preserve constant zero when doing partial register restore Andrii Nakryiko
2023-11-21 16:20   ` Eduard Zingerman
2023-11-21 18:14     ` Andrii Nakryiko
2023-11-21 20:22       ` Eduard Zingerman
2023-11-21  0:22 ` [PATCH v2 bpf-next 07/10] selftests/bpf: validate zero preservation for sub-slot loads Andrii Nakryiko
2023-11-21 16:20   ` Eduard Zingerman
2023-11-21 18:15     ` Andrii Nakryiko
2023-11-21  0:22 ` [PATCH v2 bpf-next 08/10] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Andrii Nakryiko
2023-11-21 20:38   ` Alexei Starovoitov
2023-11-21 22:01     ` Andrii Nakryiko
2023-11-21  0:22 ` [PATCH v2 bpf-next 09/10] selftests/bpf: validate precision logic in partial_stack_load_preserves_zeros Andrii Nakryiko
2023-11-21 16:20   ` Eduard Zingerman
2023-11-21  0:22 ` [PATCH v2 bpf-next 10/10] bpf: use common instruction history across all states Andrii Nakryiko
2023-11-21 16:20   ` Eduard Zingerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox