[PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
@ 2025-02-28  6:00 Eduard Zingerman
  2025-02-28  6:00 ` [PATCH bpf-next v1 1/3] " Eduard Zingerman
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Eduard Zingerman @ 2025-02-28  6:00 UTC (permalink / raw)
  To: bpf, ast
  Cc: andrii, daniel, martin.lau, kernel-team, yonghong.song, tj,
	Eduard Zingerman

This patch set introduces a simple live register DFA analysis.
The analysis is performed as a separate step before the main
verification pass, and results are stored in `env->insn_aux_data` for
each instruction.

This change improves handling of iterator/callback-based loops, as
regular register liveness marks are not finalized while loops are
being processed. See veristat results for selftests and sched_ext in
patch #2.

The patch set was tested in branch [1] by disabling the current
register parentage chain liveness computation, using DFA-based
liveness for registers while assuming all stack slots as live.
No notable regressions were found in test_progs-based tests.

Note: For regular subprogram calls, the analysis conservatively
assumes that registers r1-r5 are used and that r0 is used at each
`exit` instruction. Experiments in [1] show that adding precise
handling for these cases has no impact on verification performance for
selftests and sched_ext.

This was previously shared as RFC [2].
Changes since RFC:
- The parameter count for helpers and kfuncs is now taken into
  account.
- The analysis is now enabled only in privileged mode (Alexei);
- The `copy_verifier_state()` bug fix was merged separately and is no
  longer a part of this patch set.

[1] https://github.com/eddyz87/bpf/tree/liveregs-dfa-std-liveregs-off
[2] https://lore.kernel.org/bpf/20250122120442.3536298-1-eddyz87@gmail.com/

Eduard Zingerman (3):
  bpf: simple DFA-based live registers analysis
  bpf: use register liveness information for func_states_equal
  selftests/bpf: test cases for compute_live_registers()

 include/linux/bpf_verifier.h                  |   7 +
 kernel/bpf/verifier.c                         | 394 +++++++++++++++--
 .../testing/selftests/bpf/prog_tests/align.c  |  11 +-
 .../bpf/prog_tests/compute_live_registers.c   |   9 +
 tools/testing/selftests/bpf/progs/bpf_misc.h  |  12 +
 .../bpf/progs/compute_live_registers.c        | 397 ++++++++++++++++++
 .../selftests/bpf/progs/verifier_gotol.c      |   6 +-
 .../bpf/progs/verifier_iterating_callbacks.c  |   6 +-
 8 files changed, 804 insertions(+), 38 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/compute_live_registers.c
 create mode 100644 tools/testing/selftests/bpf/progs/compute_live_registers.c

-- 
2.48.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v1 1/3] bpf: simple DFA-based live registers analysis
  2025-02-28  6:00 [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Eduard Zingerman
@ 2025-02-28  6:00 ` Eduard Zingerman
  2025-03-01  2:01   ` Alexei Starovoitov
  2025-02-28  6:00 ` [PATCH bpf-next v1 2/3] bpf: use register liveness information for func_states_equal Eduard Zingerman
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Eduard Zingerman @ 2025-02-28  6:00 UTC (permalink / raw)
  To: bpf, ast
  Cc: andrii, daniel, martin.lau, kernel-team, yonghong.song, tj,
	Eduard Zingerman

Compute may-live registers before each instruction in the program.
A register is live before instruction I if it is read by I or by some
instruction S that follows I, provided it is not overwritten between I
and S.

This information will be used in the next patch as a hint in
func_states_equal().

Use a simple algorithm described in [1] to compute this information:
- define the following:
  - I.use : the set of all registers read by instruction I;
  - I.def : the set of all registers written by instruction I;
  - I.in  : the set of all registers that may be alive before I
            execution;
  - I.out : the set of all registers that may be alive after I
            execution;
  - I.successors : the set of instructions S that might immediately
                   follow I for some program execution;
- associate separate empty sets 'I.in' and 'I.out' with each instruction;
- visit each instruction in a postorder and update the corresponding
  'I.in' and 'I.out' sets as follows:

      I.out = U [S.in for S in I.successors]
      I.in  = (I.out / I.def) U I.use

  (where U stands for set union, / stands for set difference)
- repeat the computation while I.{in,out} changes for any instruction.

On implementation side keep things as simple, as possible:
- check_cfg() already marks instructions EXPLORED in post-order,
  modify it to save the index of each EXPLORED instruction in a vector;
- represent I.{in,out,use,def} as bitmasks;
- don't split the program into basic blocks and don't maintain the
  work queue, instead:
  - perform fixed-point computation by visiting each instruction;
  - maintain a simple 'changed' flag to track if I.{in,out} changes
    for any instruction;

  Measurements show that even this simplistic implementation does not
  add measurable verification time overhead (at least for selftests).

Note on check_cfg() ex_insn_beg/ex_done change:
To avoid out of bounds access to env->cfg.insn_postorder array,
it must be guaranteed that an instruction transitions to the EXPLORED
state only once. Previously, this was not the case for incorrect
programs with direct calls to exception callbacks.

The 'align' selftest needs adjustment to skip the computed
instruction/live registers printout. Otherwise, it matches lines from
this printout instead of verification log.

[1] https://en.wikipedia.org/wiki/Live-variable_analysis

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 include/linux/bpf_verifier.h                  |   6 +
 kernel/bpf/verifier.c                         | 376 ++++++++++++++++--
 .../testing/selftests/bpf/prog_tests/align.c  |  11 +-
 3 files changed, 369 insertions(+), 24 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index bbd013c38ff9..8c23958bc471 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -591,6 +591,8 @@ struct bpf_insn_aux_data {
 	 * accepts callback function as a parameter.
 	 */
 	bool calls_callback;
+	/* registers alive before this instruction. */
+	u16 live_regs_before;
 };
 
 #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
@@ -747,7 +749,11 @@ struct bpf_verifier_env {
 	struct {
 		int *insn_state;
 		int *insn_stack;
+		/* vector of instruction indexes sorted in post-order */
+		int *insn_postorder;
 		int cur_stack;
+		/* current position in the insn_postorder vector */
+		int cur_postorder;
 	} cfg;
 	struct backtrack_state bt;
 	struct bpf_insn_hist_entry *insn_hist;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index dcd0da4e62fc..4ac7dc58d9b1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3353,6 +3353,15 @@ static int add_subprog_and_kfunc(struct bpf_verifier_env *env)
 	return 0;
 }
 
+static int jmp_offset(struct bpf_insn *insn)
+{
+	u8 code = insn->code;
+
+	if (code == (BPF_JMP32 | BPF_JA))
+		return insn->imm;
+	return insn->off;
+}
+
 static int check_subprogs(struct bpf_verifier_env *env)
 {
 	int i, subprog_start, subprog_end, off, cur_subprog = 0;
@@ -3379,10 +3388,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
 			goto next;
 		if (BPF_OP(code) == BPF_EXIT || BPF_OP(code) == BPF_CALL)
 			goto next;
-		if (code == (BPF_JMP32 | BPF_JA))
-			off = i + insn[i].imm + 1;
-		else
-			off = i + insn[i].off + 1;
+		off = i + jmp_offset(&insn[i]) + 1;
 		if (off < subprog_start || off >= subprog_end) {
 			verbose(env, "jump out of range from insn %d to %d\n", i, off);
 			return -EINVAL;
@@ -3912,6 +3918,17 @@ static const char *disasm_kfunc_name(void *data, const struct bpf_insn *insn)
 	return btf_name_by_offset(desc_btf, func->name_off);
 }
 
+static void verbose_insn(struct bpf_verifier_env *env, struct bpf_insn *insn)
+{
+	const struct bpf_insn_cbs cbs = {
+		.cb_call	= disasm_kfunc_name,
+		.cb_print	= verbose,
+		.private_data	= env,
+	};
+
+	print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
+}
+
 static inline void bt_init(struct backtrack_state *bt, u32 frame)
 {
 	bt->frame = frame;
@@ -4112,11 +4129,6 @@ static bool calls_callback(struct bpf_verifier_env *env, int insn_idx);
 static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 			  struct bpf_insn_hist_entry *hist, struct backtrack_state *bt)
 {
-	const struct bpf_insn_cbs cbs = {
-		.cb_call	= disasm_kfunc_name,
-		.cb_print	= verbose,
-		.private_data	= env,
-	};
 	struct bpf_insn *insn = env->prog->insnsi + idx;
 	u8 class = BPF_CLASS(insn->code);
 	u8 opcode = BPF_OP(insn->code);
@@ -4134,7 +4146,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 		fmt_stack_mask(env->tmp_str_buf, TMP_STR_BUF_LEN, bt_stack_mask(bt));
 		verbose(env, "stack=%s before ", env->tmp_str_buf);
 		verbose(env, "%d: ", idx);
-		print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
+		verbose_insn(env, insn);
 	}
 
 	/* If there is a history record that some registers gained range at this insn,
@@ -11011,6 +11023,9 @@ static int get_helper_proto(struct bpf_verifier_env *env, int func_id,
 	return *ptr ? 0 : -EINVAL;
 }
 
+/* Bitmask with 1s for all caller saved registers */
+#define ALL_CALLER_SAVED_REGS ((1u << CALLER_SAVED_REGS) - 1)
+
 static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			     int *insn_idx_p)
 {
@@ -17246,9 +17261,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
 static int check_cfg(struct bpf_verifier_env *env)
 {
 	int insn_cnt = env->prog->len;
-	int *insn_stack, *insn_state;
+	int *insn_stack, *insn_state, *insn_postorder;
 	int ex_insn_beg, i, ret = 0;
-	bool ex_done = false;
 
 	insn_state = env->cfg.insn_state = kvcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
 	if (!insn_state)
@@ -17260,6 +17274,17 @@ static int check_cfg(struct bpf_verifier_env *env)
 		return -ENOMEM;
 	}
 
+	insn_postorder = env->cfg.insn_postorder = kvcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
+	if (!insn_postorder) {
+		kvfree(insn_state);
+		kvfree(insn_stack);
+		return -ENOMEM;
+	}
+
+	ex_insn_beg = env->exception_callback_subprog
+		      ? env->subprog_info[env->exception_callback_subprog].start
+		      : 0;
+
 	insn_state[0] = DISCOVERED; /* mark 1st insn as discovered */
 	insn_stack[0] = 0; /* 0 is the first instruction */
 	env->cfg.cur_stack = 1;
@@ -17273,6 +17298,7 @@ static int check_cfg(struct bpf_verifier_env *env)
 		case DONE_EXPLORING:
 			insn_state[t] = EXPLORED;
 			env->cfg.cur_stack--;
+			insn_postorder[env->cfg.cur_postorder++] = t;
 			break;
 		case KEEP_EXPLORING:
 			break;
@@ -17291,13 +17317,10 @@ static int check_cfg(struct bpf_verifier_env *env)
 		goto err_free;
 	}
 
-	if (env->exception_callback_subprog && !ex_done) {
-		ex_insn_beg = env->subprog_info[env->exception_callback_subprog].start;
-
+	if (ex_insn_beg && insn_state[ex_insn_beg] != EXPLORED) {
 		insn_state[ex_insn_beg] = DISCOVERED;
 		insn_stack[0] = ex_insn_beg;
 		env->cfg.cur_stack = 1;
-		ex_done = true;
 		goto walk_cfg;
 	}
 
@@ -19121,19 +19144,13 @@ static int do_check(struct bpf_verifier_env *env)
 		}
 
 		if (env->log.level & BPF_LOG_LEVEL) {
-			const struct bpf_insn_cbs cbs = {
-				.cb_call	= disasm_kfunc_name,
-				.cb_print	= verbose,
-				.private_data	= env,
-			};
-
 			if (verifier_state_scratched(env))
 				print_insn_state(env, state, state->curframe);
 
 			verbose_linfo(env, env->insn_idx, "; ");
 			env->prev_log_pos = env->log.end_pos;
 			verbose(env, "%d: ", env->insn_idx);
-			print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
+			verbose_insn(env, insn);
 			env->prev_insn_print_pos = env->log.end_pos - env->prev_log_pos;
 			env->prev_log_pos = env->log.end_pos;
 		}
@@ -23199,6 +23216,312 @@ static int process_fd_array(struct bpf_verifier_env *env, union bpf_attr *attr,
 	return 0;
 }
 
+static bool can_fallthrough(struct bpf_insn *insn)
+{
+	u8 class = BPF_CLASS(insn->code);
+	u8 opcode = BPF_OP(insn->code);
+
+	if (class != BPF_JMP && class != BPF_JMP32)
+		return true;
+
+	if (opcode == BPF_EXIT || opcode == BPF_JA)
+		return false;
+
+	return true;
+}
+
+static bool can_jump(struct bpf_insn *insn)
+{
+	u8 class = BPF_CLASS(insn->code);
+	u8 opcode = BPF_OP(insn->code);
+
+	if (class != BPF_JMP && class != BPF_JMP32)
+		return false;
+
+	switch (opcode) {
+	case BPF_JA:
+	case BPF_JEQ:
+	case BPF_JNE:
+	case BPF_JLT:
+	case BPF_JLE:
+	case BPF_JGT:
+	case BPF_JGE:
+	case BPF_JSGT:
+	case BPF_JSGE:
+	case BPF_JSLT:
+	case BPF_JSLE:
+	case BPF_JCOND:
+		return true;
+	}
+
+	return false;
+}
+
+static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
+{
+	struct bpf_insn *insn = &prog->insnsi[idx];
+	int i = 0, insn_sz;
+	u32 dst;
+
+	succ[0] = prog->len;
+	succ[1] = prog->len;
+
+	insn_sz = bpf_is_ldimm64(insn) ? 2 : 1;
+	if (can_fallthrough(insn) && idx + 1 < prog->len)
+		succ[i++] = idx + insn_sz;
+
+	if (can_jump(insn)) {
+		dst = idx + jmp_offset(insn) + 1;
+		if (i == 0 || succ[0] != dst)
+			succ[i++] = dst;
+	}
+
+	return i;
+}
+
+/* Each field is a register bitmask */
+struct insn_live_regs {
+	u16 use;	/* registers read by instruction */
+	u16 def;	/* registers written by instruction */
+	u16 in;		/* registers that may be alive before instruction */
+	u16 out;	/* registers that may be alive after instruction */
+};
+
+/* Compute *use and *def values for the call instruction */
+static void compute_call_live_regs(struct bpf_verifier_env *env,
+				   struct bpf_insn *insn,
+				   u16 *use, u16 *def)
+{
+	struct bpf_kfunc_call_arg_meta meta;
+	const struct bpf_func_proto *fn;
+	int err, i, nargs;
+
+	*def = ALL_CALLER_SAVED_REGS;
+	*use = *def & ~BIT(BPF_REG_0);
+	if (bpf_helper_call(insn)) {
+		err = get_helper_proto(env, insn->imm, &fn);
+		if (err)
+			return;
+		*use = 0;
+		for (i = 1; i < CALLER_SAVED_REGS; i++) {
+			if (fn->arg_type[i - 1] == ARG_DONTCARE)
+				break;
+			*use |= BIT(i);
+		}
+	} else if (bpf_pseudo_kfunc_call(insn)) {
+		err = fetch_kfunc_meta(env, insn, &meta, NULL);
+		if (err)
+			return;
+		nargs = btf_type_vlen(meta.func_proto);
+		*use = 0;
+		for (i = 1; i <= nargs; i++)
+			*use |= BIT(i);
+	}
+}
+
+/* Compute info->{use,def} fields for the instruction */
+static void compute_insn_live_regs(struct bpf_verifier_env *env,
+				   struct bpf_insn *insn,
+				   struct insn_live_regs *info)
+{
+	u8 class = BPF_CLASS(insn->code);
+	u8 code = BPF_OP(insn->code);
+	u8 mode = BPF_MODE(insn->code);
+	u16 src = BIT(insn->src_reg);
+	u16 dst = BIT(insn->dst_reg);
+	u16 r0  = BIT(0);
+	u16 def = 0;
+	u16 use = 0xffff;
+
+	switch (class) {
+	case BPF_LD:
+		switch (mode) {
+		case BPF_IMM:
+			if (BPF_SIZE(insn->code) == BPF_DW) {
+				def = dst;
+				use = 0;
+			}
+			break;
+		case BPF_LD | BPF_ABS:
+		case BPF_LD | BPF_IND:
+			/* stick with defaults */
+			break;
+		}
+		break;
+	case BPF_LDX:
+		switch (mode) {
+		case BPF_MEM:
+		case BPF_MEMSX:
+			def = dst;
+			use = src;
+			break;
+		}
+		break;
+	case BPF_ST:
+		switch (mode) {
+		case BPF_MEM:
+			def = 0;
+			use = dst;
+			break;
+		}
+		break;
+	case BPF_STX:
+		switch (mode) {
+		case BPF_MEM:
+			def = 0;
+			use = dst | src;
+			break;
+		case BPF_ATOMIC:
+			use = dst | src;
+			if (insn->imm & BPF_FETCH) {
+				if (insn->imm == BPF_CMPXCHG)
+					def = r0;
+				else
+					def = src;
+			} else {
+				def = 0;
+			}
+			break;
+		}
+		break;
+	case BPF_ALU:
+	case BPF_ALU64:
+		switch (code) {
+		case BPF_END:
+			use = dst;
+			def = dst;
+			break;
+		case BPF_MOV:
+			def = dst;
+			if (BPF_SRC(insn->code) == BPF_K)
+				use = 0;
+			else
+				use = src;
+			break;
+		default:
+			def = dst;
+			if (BPF_SRC(insn->code) == BPF_K)
+				use = dst;
+			else
+				use = dst | src;
+		}
+		break;
+	case BPF_JMP:
+	case BPF_JMP32:
+		switch (code) {
+		case BPF_JA:
+			def = 0;
+			use = 0;
+			break;
+		case BPF_EXIT:
+			def = 0;
+			use = r0;
+			break;
+		case BPF_CALL:
+			compute_call_live_regs(env, insn, &use, &def);
+			break;
+		default:
+			def = 0;
+			if (BPF_SRC(insn->code) == BPF_K)
+				use = dst;
+			else
+				use = dst | src;
+		}
+		break;
+	}
+
+	info->def = def;
+	info->use = use;
+}
+
+/* Compute may-live registers before each instruction in the program.
+ * A register is live before instruction I if it is read by I or by some
+ * instruction S that follows I, provided it is not overwritten between I
+ * and S.
+ *
+ * Store result in env->insn_aux_data[i].live_regs.
+ */
+static int compute_live_registers(struct bpf_verifier_env *env)
+{
+	struct bpf_insn_aux_data *insn_aux = env->insn_aux_data;
+	struct bpf_insn *insns = env->prog->insnsi;
+	struct insn_live_regs *state;
+	int insn_cnt = env->prog->len;
+	int err = 0, i, j;
+	bool changed;
+
+	/* Use simple algorithm desribed in:
+	 * https://en.wikipedia.org/wiki/Live-variable_analysis
+	 *
+	 * - visit each instruction in a postorder and update
+	 *   state[i].in, state[i].out as follows:
+	 *
+	 *       state[i].out = U [state[s].in for S in insn_successors(i)]
+	 *       state[i].in  = (state[i].out / state[i].def) U state[i].use
+	 *
+	 *   (where U stands for set union, / stands for set difference)
+	 * - repeat the computation while {in,out} fields change for
+	 *   any instruction.
+	 */
+	state = kvcalloc(insn_cnt, sizeof(*state), GFP_KERNEL);
+	if (!state) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	for (i = 0; i < insn_cnt; ++i)
+		compute_insn_live_regs(env, &insns[i], &state[i]);
+
+	changed = true;
+	while (changed) {
+		changed = false;
+		for (i = 0; i < env->cfg.cur_postorder; ++i) {
+			int insn_idx = env->cfg.insn_postorder[i];
+			struct insn_live_regs *live = &state[insn_idx];
+			int succ_num;
+			u32 succ[2];
+			u16 new_out = 0;
+			u16 new_in = 0;
+
+			succ_num = insn_successors(env->prog, insn_idx, succ);
+			for (int s = 0; s < succ_num; ++s)
+				new_out |= state[succ[s]].in;
+			new_in = (new_out & ~live->def) | live->use;
+			if (new_out != live->out || new_in != live->in) {
+				live->in = new_in;
+				live->out = new_out;
+				changed = true;
+			}
+		}
+	}
+
+	for (i = 0; i < insn_cnt; ++i)
+		insn_aux[i].live_regs_before = state[i].in;
+
+	if (env->log.level & BPF_LOG_LEVEL2) {
+		verbose(env, "Live regs before insn:\n");
+		for (i = 0; i < insn_cnt; ++i) {
+			verbose(env, "%3d: ", i);
+			for (j = BPF_REG_0; j < BPF_REG_10; ++j)
+				if (insn_aux[i].live_regs_before & BIT(j))
+					verbose(env, "%d", j);
+				else
+					verbose(env, ".");
+			verbose(env, " ");
+			verbose_insn(env, &insns[i]);
+			if (bpf_is_ldimm64(&insns[i]))
+				i++;
+		}
+	}
+
+out:
+	kvfree(state);
+	kvfree(env->cfg.insn_postorder);
+	env->cfg.insn_postorder = NULL;
+	env->cfg.cur_postorder = 0;
+	return err;
+}
+
 int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
 {
 	u64 start_time = ktime_get_ns();
@@ -23320,6 +23643,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	if (ret)
 		goto skip_full_check;
 
+	if (is_priv) {
+		ret = compute_live_registers(env);
+		if (ret < 0)
+			goto skip_full_check;
+	}
+
 	ret = mark_fastcall_patterns(env);
 	if (ret < 0)
 		goto skip_full_check;
@@ -23458,6 +23787,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	vfree(env->insn_aux_data);
 	kvfree(env->insn_hist);
 err_free_env:
+	kvfree(env->cfg.insn_postorder);
 	kvfree(env);
 	return ret;
 }
diff --git a/tools/testing/selftests/bpf/prog_tests/align.c b/tools/testing/selftests/bpf/prog_tests/align.c
index 4ebd0da898f5..1d53a8561ee2 100644
--- a/tools/testing/selftests/bpf/prog_tests/align.c
+++ b/tools/testing/selftests/bpf/prog_tests/align.c
@@ -610,9 +610,11 @@ static int do_test_single(struct bpf_align_test *test)
 		.log_size = sizeof(bpf_vlog),
 		.log_level = 2,
 	);
+	const char *main_pass_start = "0: R1=ctx() R10=fp0";
 	const char *line_ptr;
 	int cur_line = -1;
 	int prog_len, i;
+	char *start;
 	int fd_prog;
 	int ret;
 
@@ -632,7 +634,13 @@ static int do_test_single(struct bpf_align_test *test)
 		ret = 0;
 		/* We make a local copy so that we can strtok() it */
 		strncpy(bpf_vlog_copy, bpf_vlog, sizeof(bpf_vlog_copy));
-		line_ptr = strtok(bpf_vlog_copy, "\n");
+		start = strstr(bpf_vlog_copy, main_pass_start);
+		if (!start) {
+			ret = 1;
+			printf("Can't find initial line '%s'\n", main_pass_start);
+			goto out;
+		}
+		line_ptr = strtok(start, "\n");
 		for (i = 0; i < MAX_MATCHES; i++) {
 			struct bpf_reg_match m = test->matches[i];
 			const char *p;
@@ -682,6 +690,7 @@ static int do_test_single(struct bpf_align_test *test)
 				break;
 			}
 		}
+out:
 		if (fd_prog >= 0)
 			close(fd_prog);
 	}
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v1 2/3] bpf: use register liveness information for func_states_equal
  2025-02-28  6:00 [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Eduard Zingerman
  2025-02-28  6:00 ` [PATCH bpf-next v1 1/3] " Eduard Zingerman
@ 2025-02-28  6:00 ` Eduard Zingerman
  2025-02-28  6:00 ` [PATCH bpf-next v1 3/3] selftests/bpf: test cases for compute_live_registers() Eduard Zingerman
  2025-03-01  2:10 ` [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Alexei Starovoitov
  3 siblings, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2025-02-28  6:00 UTC (permalink / raw)
  To: bpf, ast
  Cc: andrii, daniel, martin.lau, kernel-team, yonghong.song, tj,
	Eduard Zingerman

Liveness analysis DFA computes a set of registers that are live before
each instruction. Leverage this information to skip the comparison of
dead registers in `func_states_equal()`. This helps with the
convergence of iterator-based loops, as `bpf_reg_state->live` marks
can't be used when loops are processed. For now, enable this only in
privileged mode.

Verification performance impact on selftests and sched_ext is listed
below. (Using `veristat -C -f "insns_pct>5" -f "!insns<200"` filters).

selftests:

File                  Program                        Insns (A)  Insns (B)  Insns     (DIFF)  States (A)  States (B)  States  (DIFF)
--------------------  -----------------------------  ---------  ---------  ----------------  ----------  ----------  --------------
arena_htab_asm.bpf.o  arena_htab_asm                       445        413      -32 (-7.19%)          37          33    -4 (-10.81%)
arena_list.bpf.o      arena_list_add                      1822        833    -989 (-54.28%)          37          22   -15 (-40.54%)
dynptr_success.bpf.o  test_dynptr_copy                     267        203     -64 (-23.97%)          22          16    -6 (-27.27%)
dynptr_success.bpf.o  test_dynptr_copy_xdp                 719        615    -104 (-14.46%)          68          58   -10 (-14.71%)
iters.bpf.o           checkpoint_states_deletion         22154       1211  -20943 (-94.53%)         918          40  -878 (-95.64%)
iters.bpf.o           clean_live_states                   1348        588    -760 (-56.38%)         136          66   -70 (-51.47%)
iters.bpf.o           iter_nested_deeply_iters             367        300     -67 (-18.26%)          43          37    -6 (-13.95%)
iters.bpf.o           iter_nested_iters                    772        632    -140 (-18.13%)          72          62   -10 (-13.89%)
iters.bpf.o           iter_pass_iter_ptr_to_subprog        285        243     -42 (-14.74%)          30          26    -4 (-13.33%)
iters.bpf.o           iter_subprog_iters                   808        664    -144 (-17.82%)          68          59    -9 (-13.24%)
iters.bpf.o           loop_state_deps2                     356        321      -35 (-9.83%)          35          32     -3 (-8.57%)
iters_css.bpf.o       iter_css_for_each                    296        267      -29 (-9.80%)          32          29     -3 (-9.38%)
pyperf600_iter.bpf.o  on_event                            6379       2591   -3788 (-59.38%)         286         192   -94 (-32.87%)
test_usdt.bpf.o       usdt12                              1983       1803     -180 (-9.08%)         143         136     -7 (-4.90%)

sched_ext:

File               Program                 Insns (A)  Insns (B)  Insns     (DIFF)  States (A)  States (B)  States   (DIFF)
-----------------  ----------------------  ---------  ---------  ----------------  ----------  ----------  ---------------
bpf.bpf.o          lavd_dispatch              154608     120590  -34018 (-22.00%)        8950        7065  -1885 (-21.06%)
bpf.bpf.o          lavd_init                    7330       6935     -395 (-5.39%)         516         480     -36 (-6.98%)
bpf.bpf.o          layered_dispatch             9039       5590   -3449 (-38.16%)         662         501   -161 (-24.32%)
bpf.bpf.o          layered_dump                 5022       3669   -1353 (-26.94%)         298         237    -61 (-20.47%)
bpf.bpf.o          layered_init                 5549       4298   -1251 (-22.54%)         523         423   -100 (-19.12%)
bpf.bpf.o          layered_init_task             270        234     -36 (-13.33%)          24          22      -2 (-8.33%)
bpf.bpf.o          layered_runnable             1899       1635    -264 (-13.90%)         151         125    -26 (-17.22%)
bpf.bpf.o          p2dq_dispatch                 659        533    -126 (-19.12%)          66          53    -13 (-19.70%)
bpf.bpf.o          p2dq_init                    1936       1560    -376 (-19.42%)         170         142    -28 (-16.47%)
bpf.bpf.o          refresh_layer_cpumasks       1285        785    -500 (-38.91%)         120          78    -42 (-35.00%)
bpf.bpf.o          rustland_init                 476        413     -63 (-13.24%)          37          34      -3 (-8.11%)
bpf.bpf.o          rustland_init                 476        413     -63 (-13.24%)          37          34      -3 (-8.11%)
bpf.bpf.o          rusty_select_cpu             1386       1110    -276 (-19.91%)         125         108    -17 (-13.60%)
bpf.bpf.o          rusty_set_cpumask            4558       4276     -282 (-6.19%)         323         313     -10 (-3.10%)
scx_central.bpf.o  central_dispatch              600        422    -178 (-29.67%)          59          43    -16 (-27.12%)
scx_central.bpf.o  central_init                  632        318    -314 (-49.68%)          39          28    -11 (-28.21%)
scx_nest.bpf.o     nest_init                     601        519     -82 (-13.64%)          58          51     -7 (-12.07%)
scx_pair.bpf.o     pair_dispatch                1914       1376    -538 (-28.11%)         142         111    -31 (-21.83%)
scx_qmap.bpf.o     qmap_dispatch                2187       1703    -484 (-22.13%)         174         141    -33 (-18.97%)
scx_qmap.bpf.o     qmap_init                   22777      18458   -4319 (-18.96%)         768         654   -114 (-14.84%)

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 include/linux/bpf_verifier.h |  1 +
 kernel/bpf/verifier.c        | 18 ++++++++++++++----
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 8c23958bc471..39097835b326 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -733,6 +733,7 @@ struct bpf_verifier_env {
 	 * to writes with variable offset and to indirect (helper) accesses.
 	 */
 	bool allow_uninit_stack;
+	bool allow_liveregs_dfa;
 	bool bpf_capable;
 	bool bypass_spec_v1;
 	bool bypass_spec_v4;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4ac7dc58d9b1..b6ab49ee31e1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18358,15 +18358,20 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
  * the current state will reach 'bpf_exit' instruction safely
  */
 static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_state *old,
-			      struct bpf_func_state *cur, enum exact_level exact)
+			      struct bpf_func_state *cur, u32 insn_idx, enum exact_level exact)
 {
-	int i;
+	u16 live_regs;
+	u16 i;
 
 	if (old->callback_depth > cur->callback_depth)
 		return false;
 
+	live_regs = env->allow_liveregs_dfa
+		    ? env->insn_aux_data[insn_idx].live_regs_before
+		    : 0xffff;
 	for (i = 0; i < MAX_BPF_REG; i++)
-		if (!regsafe(env, &old->regs[i], &cur->regs[i],
+		if ((BIT(i) & live_regs) &&
+		    !regsafe(env, &old->regs[i], &cur->regs[i],
 			     &env->idmap_scratch, exact))
 			return false;
 
@@ -18387,6 +18392,7 @@ static bool states_equal(struct bpf_verifier_env *env,
 			 struct bpf_verifier_state *cur,
 			 enum exact_level exact)
 {
+	u32 insn_idx;
 	int i;
 
 	if (old->curframe != cur->curframe)
@@ -18410,9 +18416,12 @@ static bool states_equal(struct bpf_verifier_env *env,
 	 * and all frame states need to be equivalent
 	 */
 	for (i = 0; i <= old->curframe; i++) {
+		insn_idx = i == old->curframe
+			   ? env->insn_idx
+			   : old->frame[i + 1]->callsite;
 		if (old->frame[i]->callsite != cur->frame[i]->callsite)
 			return false;
-		if (!func_states_equal(env, old->frame[i], cur->frame[i], exact))
+		if (!func_states_equal(env, old->frame[i], cur->frame[i], insn_idx, exact))
 			return false;
 	}
 	return true;
@@ -23647,6 +23656,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 		ret = compute_live_registers(env);
 		if (ret < 0)
 			goto skip_full_check;
+		env->allow_liveregs_dfa = true;
 	}
 
 	ret = mark_fastcall_patterns(env);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next v1 3/3] selftests/bpf: test cases for compute_live_registers()
  2025-02-28  6:00 [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Eduard Zingerman
  2025-02-28  6:00 ` [PATCH bpf-next v1 1/3] " Eduard Zingerman
  2025-02-28  6:00 ` [PATCH bpf-next v1 2/3] bpf: use register liveness information for func_states_equal Eduard Zingerman
@ 2025-02-28  6:00 ` Eduard Zingerman
  2025-03-01  2:10 ` [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Alexei Starovoitov
  3 siblings, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2025-02-28  6:00 UTC (permalink / raw)
  To: bpf, ast
  Cc: andrii, daniel, martin.lau, kernel-team, yonghong.song, tj,
	Eduard Zingerman

Cover instructions from each kind:
- assignment
- arithmetic
- store/load
- endian conversion
- atomics
- branches, conditional branches, may_goto, calls
- LD_ABS/LD_IND
- address_space_cast

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 .../bpf/prog_tests/compute_live_registers.c   |   9 +
 tools/testing/selftests/bpf/progs/bpf_misc.h  |  12 +
 .../bpf/progs/compute_live_registers.c        | 397 ++++++++++++++++++
 .../selftests/bpf/progs/verifier_gotol.c      |   6 +-
 .../bpf/progs/verifier_iterating_callbacks.c  |   6 +-
 5 files changed, 420 insertions(+), 10 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/compute_live_registers.c
 create mode 100644 tools/testing/selftests/bpf/progs/compute_live_registers.c

diff --git a/tools/testing/selftests/bpf/prog_tests/compute_live_registers.c b/tools/testing/selftests/bpf/prog_tests/compute_live_registers.c
new file mode 100644
index 000000000000..285f20241fe1
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/compute_live_registers.c
@@ -0,0 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "compute_live_registers.skel.h"
+#include "test_progs.h"
+
+void test_compute_live_registers(void)
+{
+	RUN_TESTS(compute_live_registers);
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
index 34f555da546f..e12e74e7e66e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_misc.h
+++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
@@ -213,4 +213,16 @@
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 #endif
 
+#if (defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86) ||	\
+     (defined(__TARGET_ARCH_riscv) && __riscv_xlen == 64) ||		\
+     defined(__TARGET_ARCH_arm) || defined(__TARGET_ARCH_s390) ||	\
+     defined(__TARGET_ARCH_loongarch)) &&				\
+	__clang_major__ >= 18
+#define CAN_USE_GOTOL
+#endif
+
+#if _clang_major__ >= 18
+#define CAN_USE_BPF_ST
+#endif
+
 #endif
diff --git a/tools/testing/selftests/bpf/progs/compute_live_registers.c b/tools/testing/selftests/bpf/progs/compute_live_registers.c
new file mode 100644
index 000000000000..f976dec2bb88
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/compute_live_registers.c
@@ -0,0 +1,397 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "../../../include/linux/filter.h"
+#include "bpf_arena_common.h"
+#include "bpf_misc.h"
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, __u64);
+} test_map SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARENA);
+	__uint(map_flags, BPF_F_MMAPABLE);
+	__uint(max_entries, 1);
+} arena SEC(".maps");
+
+SEC("socket")
+__log_level(2)
+__msg(" 0: .......... (b7) r0 = 42")
+__msg(" 1: 0......... (bf) r1 = r0")
+__msg(" 2: .1........ (bf) r2 = r1")
+__msg(" 3: ..2....... (bf) r3 = r2")
+__msg(" 4: ...3...... (bf) r4 = r3")
+__msg(" 5: ....4..... (bf) r5 = r4")
+__msg(" 6: .....5.... (bf) r6 = r5")
+__msg(" 7: ......6... (bf) r7 = r6")
+__msg(" 8: .......7.. (bf) r8 = r7")
+__msg(" 9: ........8. (bf) r9 = r8")
+__msg("10: .........9 (bf) r0 = r9")
+__msg("11: 0......... (95) exit")
+__naked void assign_chain(void)
+{
+	asm volatile (
+		"r0 = 42;"
+		"r1 = r0;"
+		"r2 = r1;"
+		"r3 = r2;"
+		"r4 = r3;"
+		"r5 = r4;"
+		"r6 = r5;"
+		"r7 = r6;"
+		"r8 = r7;"
+		"r9 = r8;"
+		"r0 = r9;"
+		"exit;"
+		::: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("0: .......... (b7) r1 = 7")
+__msg("1: .1........ (07) r1 += 7")
+__msg("2: .......... (b7) r2 = 7")
+__msg("3: ..2....... (b7) r3 = 42")
+__msg("4: ..23...... (0f) r2 += r3")
+__msg("5: .......... (b7) r0 = 0")
+__msg("6: 0......... (95) exit")
+__naked void arithmetics(void)
+{
+	asm volatile (
+		"r1 = 7;"
+		"r1 += 7;"
+		"r2 = 7;"
+		"r3 = 42;"
+		"r2 += r3;"
+		"r0 = 0;"
+		"exit;"
+		::: __clobber_all);
+}
+
+#ifdef CAN_USE_BPF_ST
+SEC("socket")
+__log_level(2)
+__msg("  1: .1........ (07) r1 += -8")
+__msg("  2: .1........ (7a) *(u64 *)(r1 +0) = 7")
+__msg("  3: .1........ (b7) r2 = 42")
+__msg("  4: .12....... (7b) *(u64 *)(r1 +0) = r2")
+__msg("  5: .12....... (7b) *(u64 *)(r1 +0) = r2")
+__msg("  6: .......... (b7) r0 = 0")
+__naked void store(void)
+{
+	asm volatile (
+		"r1 = r10;"
+		"r1 += -8;"
+		"*(u64 *)(r1 +0) = 7;"
+		"r2 = 42;"
+		"*(u64 *)(r1 +0) = r2;"
+		"*(u64 *)(r1 +0) = r2;"
+		"r0 = 0;"
+		"exit;"
+		::: __clobber_all);
+}
+#endif
+
+SEC("socket")
+__log_level(2)
+__msg("1: ....4..... (07) r4 += -8")
+__msg("2: ....4..... (79) r5 = *(u64 *)(r4 +0)")
+__msg("3: ....45.... (07) r4 += -8")
+__naked void load(void)
+{
+	asm volatile (
+		"r4 = r10;"
+		"r4 += -8;"
+		"r5 = *(u64 *)(r4 +0);"
+		"r4 += -8;"
+		"r0 = r5;"
+		"exit;"
+		::: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("0: .1........ (61) r2 = *(u32 *)(r1 +0)")
+__msg("1: ..2....... (d4) r2 = le64 r2")
+__msg("2: ..2....... (bf) r0 = r2")
+__naked void endian(void)
+{
+	asm volatile (
+		"r2 = *(u32 *)(r1 +0);"
+		"r2 = le64 r2;"
+		"r0 = r2;"
+		"exit;"
+		::: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg(" 8: 0......... (b7) r1 = 1")
+__msg(" 9: 01........ (db) r1 = atomic64_fetch_add((u64 *)(r0 +0), r1)")
+__msg("10: 01........ (c3) lock *(u32 *)(r0 +0) += r1")
+__msg("11: 01........ (db) r1 = atomic64_xchg((u64 *)(r0 +0), r1)")
+__msg("12: 01........ (bf) r2 = r0")
+__msg("13: .12....... (bf) r0 = r1")
+__msg("14: .12....... (db) r0 = atomic64_cmpxchg((u64 *)(r2 +0), r0, r1)")
+__naked void atomic(void)
+{
+	asm volatile (
+		"r2 = r10;"
+		"r2 += -8;"
+		"r1 = 0;"
+		"*(u64 *)(r2 +0) = r1;"
+		"r1 = %[test_map] ll;"
+		"call %[bpf_map_lookup_elem];"
+		"if r0 == 0 goto 1f;"
+		"r1 = 1;"
+		"r1 = atomic_fetch_add((u64 *)(r0 +0), r1);"
+		".8byte %[add_nofetch];" /* same as "lock *(u32 *)(r0 +0) += r1;" */
+		"r1 = xchg_64(r0 + 0, r1);"
+		"r2 = r0;"
+		"r0 = r1;"
+		"r0 = cmpxchg_64(r2 + 0, r0, r1);"
+		"1: exit;"
+		:
+		: __imm(bpf_map_lookup_elem),
+		  __imm_addr(test_map),
+		  __imm_insn(add_nofetch, BPF_ATOMIC_OP(BPF_W, BPF_ADD, BPF_REG_0, BPF_REG_1, 0))
+		: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("4: .12....7.. (85) call bpf_trace_printk#6")
+__msg("5: 0......7.. (0f) r0 += r7")
+__naked void regular_call(void)
+{
+	asm volatile (
+		"r7 = 1;"
+		"r1 = r10;"
+		"r1 += -8;"
+		"r2 = 1;"
+		"call %[bpf_trace_printk];"
+		"r0 += r7;"
+		"exit;"
+		:
+		: __imm(bpf_trace_printk)
+		: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("2: 012....... (25) if r1 > 0x7 goto pc+1")
+__msg("3: ..2....... (bf) r0 = r2")
+__naked void if1(void)
+{
+	asm volatile (
+		"r0 = 1;"
+		"r2 = 2;"
+		"if r1 > 0x7 goto +1;"
+		"r0 = r2;"
+		"exit;"
+		::: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("3: 0123...... (2d) if r1 > r3 goto pc+1")
+__msg("4: ..2....... (bf) r0 = r2")
+__naked void if2(void)
+{
+	asm volatile (
+		"r0 = 1;"
+		"r2 = 2;"
+		"r3 = 7;"
+		"if r1 > r3 goto +1;"
+		"r0 = r2;"
+		"exit;"
+		::: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("0: .......... (b7) r1 = 0")
+__msg("1: .1........ (b7) r2 = 7")
+__msg("2: .12....... (25) if r1 > 0x7 goto pc+4")
+__msg("3: .12....... (07) r1 += 1")
+__msg("4: .12....... (27) r2 *= 2")
+__msg("5: .12....... (05) goto pc+0")
+__msg("6: .12....... (05) goto pc-5")
+__msg("7: .......... (b7) r0 = 0")
+__msg("8: 0......... (95) exit")
+__naked void loop(void)
+{
+	asm volatile (
+		"r1 = 0;"
+		"r2 = 7;"
+		"if r1 > 0x7 goto +4;"
+		"r1 += 1;"
+		"r2 *= 2;"
+		"goto +0;"
+		"goto -5;"
+		"r0 = 0;"
+		"exit;"
+		:
+		: __imm(bpf_trace_printk)
+		: __clobber_all);
+}
+
+#ifdef CAN_USE_GOTOL
+SEC("socket")
+__log_level(2)
+__msg("2: .123...... (25) if r1 > 0x7 goto pc+2")
+__msg("3: ..2....... (bf) r0 = r2")
+__msg("4: 0......... (06) gotol pc+1")
+__msg("5: ...3...... (bf) r0 = r3")
+__msg("6: 0......... (95) exit")
+__naked void gotol(void)
+{
+	asm volatile (
+		"r2 = 42;"
+		"r3 = 24;"
+		"if r1 > 0x7 goto +2;"
+		"r0 = r2;"
+		"gotol +1;"
+		"r0 = r3;"
+		"exit;"
+		:
+		: __imm(bpf_trace_printk)
+		: __clobber_all);
+}
+#endif
+
+SEC("socket")
+__log_level(2)
+__msg("0: 0......... (b7) r1 = 1")
+__msg("1: 01........ (e5) may_goto pc+1")
+__msg("2: 0......... (05) goto pc-3")
+__msg("3: .1........ (bf) r0 = r1")
+__msg("4: 0......... (95) exit")
+__naked void may_goto(void)
+{
+	asm volatile (
+	"1: r1 = 1;"
+	".8byte %[may_goto];"
+	"goto 1b;"
+	"r0 = r1;"
+	"exit;"
+	:
+	: __imm(bpf_get_smp_processor_id),
+	  __imm_insn(may_goto, BPF_RAW_INSN(BPF_JMP | BPF_JCOND, 0, 0, +1 /* offset */, 0))
+	: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("1: 0......... (18) r2 = 0x7")
+__msg("3: 0.2....... (0f) r0 += r2")
+__naked void ldimm64(void)
+{
+	asm volatile (
+		"r0 = 0;"
+		"r2 = 0x7 ll;"
+		"r0 += r2;"
+		"exit;"
+		:
+		:: __clobber_all);
+}
+
+/* No rules specific for LD_ABS/LD_IND, default behaviour kicks in */
+SEC("socket")
+__log_level(2)
+__msg("2: 0123456789 (30) r0 = *(u8 *)skb[42]")
+__msg("3: 012.456789 (0f) r7 += r0")
+__msg("4: 012.456789 (b7) r3 = 42")
+__msg("5: 0123456789 (50) r0 = *(u8 *)skb[r3 + 0]")
+__msg("6: 0......7.. (0f) r7 += r0")
+__naked void ldabs(void)
+{
+	asm volatile (
+		"r6 = r1;"
+		"r7 = 0;"
+		"r0 = *(u8 *)skb[42];"
+		"r7 += r0;"
+		"r3 = 42;"
+		".8byte %[ld_ind];" /* same as "r0 = *(u8 *)skb[r3];" */
+		"r7 += r0;"
+		"r0 = r7;"
+		"exit;"
+		:
+		: __imm_insn(ld_ind, BPF_LD_IND(BPF_B, BPF_REG_3, 0))
+		: __clobber_all);
+}
+
+
+#ifdef __BPF_FEATURE_ADDR_SPACE_CAST
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__log_level(2)
+__msg(" 6: .12345.... (85) call bpf_arena_alloc_pages")
+__msg(" 7: 0......... (bf) r1 = addr_space_cast(r0, 0, 1)")
+__msg(" 8: .1........ (b7) r2 = 42")
+__naked void addr_space_cast(void)
+{
+	asm volatile (
+		"r1 = %[arena] ll;"
+		"r2 = 0;"
+		"r3 = 1;"
+		"r4 = 0;"
+		"r5 = 0;"
+		"call %[bpf_arena_alloc_pages];"
+		"r1 = addr_space_cast(r0, 0, 1);"
+		"r2 = 42;"
+		"*(u64 *)(r1 +0) = r2;"
+		"r0 = 0;"
+		"exit;"
+		:
+		: __imm(bpf_arena_alloc_pages),
+		  __imm_addr(arena)
+		: __clobber_all);
+}
+#endif
+
+static __used __naked int aux1(void)
+{
+	asm volatile (
+		"r0 = r1;"
+		"r0 += r2;"
+		"exit;"
+		::: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("0: ....45.... (b7) r1 = 1")
+__msg("1: .1..45.... (b7) r2 = 2")
+__msg("2: .12.45.... (b7) r3 = 3")
+/* Conservative liveness for subprog parameters. */
+__msg("3: .12345.... (85) call pc+2")
+__msg("4: .......... (b7) r0 = 0")
+__msg("5: 0......... (95) exit")
+__msg("6: .12....... (bf) r0 = r1")
+__msg("7: 0.2....... (0f) r0 += r2")
+/* Conservative liveness for subprog return value. */
+__msg("8: 0......... (95) exit")
+__naked void subprog1(void)
+{
+	asm volatile (
+		"r1 = 1;"
+		"r2 = 2;"
+		"r3 = 3;"
+		"call aux1;"
+		"r0 = 0;"
+		"exit;"
+		::: __clobber_all);
+}
+
+/* to retain debug info for BTF generation */
+void kfunc_root(void)
+{
+	bpf_arena_alloc_pages(0, 0, 0, 0, 0);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/verifier_gotol.c b/tools/testing/selftests/bpf/progs/verifier_gotol.c
index 05a329ee45ee..d5d8f24df394 100644
--- a/tools/testing/selftests/bpf/progs/verifier_gotol.c
+++ b/tools/testing/selftests/bpf/progs/verifier_gotol.c
@@ -4,11 +4,7 @@
 #include <bpf/bpf_helpers.h>
 #include "bpf_misc.h"
 
-#if (defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86) || \
-	(defined(__TARGET_ARCH_riscv) && __riscv_xlen == 64) || \
-	defined(__TARGET_ARCH_arm) || defined(__TARGET_ARCH_s390) || \
-	defined(__TARGET_ARCH_loongarch)) && \
-	__clang_major__ >= 18
+#ifdef CAN_USE_GOTOL
 
 SEC("socket")
 __description("gotol, small_imm")
diff --git a/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c b/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c
index e54bb5385bc1..75dd922e4e9f 100644
--- a/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c
+++ b/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c
@@ -407,11 +407,7 @@ l0_%=:	call %[bpf_jiffies64];		\
 	: __clobber_all);
 }
 
-#if (defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86) || \
-	(defined(__TARGET_ARCH_riscv) && __riscv_xlen == 64) || \
-	defined(__TARGET_ARCH_arm) || defined(__TARGET_ARCH_s390) || \
-	defined(__TARGET_ARCH_loongarch)) && \
-	__clang_major__ >= 18
+#ifdef CAN_USE_GOTOL
 SEC("socket")
 __success __retval(0)
 __naked void gotol_and_may_goto(void)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v1 1/3] bpf: simple DFA-based live registers analysis
  2025-02-28  6:00 ` [PATCH bpf-next v1 1/3] " Eduard Zingerman
@ 2025-03-01  2:01   ` Alexei Starovoitov
  2025-03-01  2:09     ` Eduard Zingerman
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2025-03-01  2:01 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Yonghong Song, Tejun Heo

On Thu, Feb 27, 2025 at 10:01 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Compute may-live registers before each instruction in the program.
> A register is live before instruction I if it is read by I or by some
> instruction S that follows I, provided it is not overwritten between I
> and S.
>
> This information will be used in the next patch as a hint in
> func_states_equal().
>
> Use a simple algorithm described in [1] to compute this information:
> - define the following:
>   - I.use : the set of all registers read by instruction I;
>   - I.def : the set of all registers written by instruction I;
>   - I.in  : the set of all registers that may be alive before I
>             execution;
>   - I.out : the set of all registers that may be alive after I
>             execution;
>   - I.successors : the set of instructions S that might immediately
>                    follow I for some program execution;
> - associate separate empty sets 'I.in' and 'I.out' with each instruction;
> - visit each instruction in a postorder and update the corresponding
>   'I.in' and 'I.out' sets as follows:
>
>       I.out = U [S.in for S in I.successors]
>       I.in  = (I.out / I.def) U I.use
>
>   (where U stands for set union, / stands for set difference)
> - repeat the computation while I.{in,out} changes for any instruction.
>
> On implementation side keep things as simple, as possible:
> - check_cfg() already marks instructions EXPLORED in post-order,
>   modify it to save the index of each EXPLORED instruction in a vector;
> - represent I.{in,out,use,def} as bitmasks;
> - don't split the program into basic blocks and don't maintain the
>   work queue, instead:
>   - perform fixed-point computation by visiting each instruction;
>   - maintain a simple 'changed' flag to track if I.{in,out} changes
>     for any instruction;
>
>   Measurements show that even this simplistic implementation does not
>   add measurable verification time overhead (at least for selftests).
>
> Note on check_cfg() ex_insn_beg/ex_done change:
> To avoid out of bounds access to env->cfg.insn_postorder array,
> it must be guaranteed that an instruction transitions to the EXPLORED
> state only once. Previously, this was not the case for incorrect
> programs with direct calls to exception callbacks.
>
> The 'align' selftest needs adjustment to skip the computed
> instruction/live registers printout. Otherwise, it matches lines from
> this printout instead of verification log.
>
> [1] https://en.wikipedia.org/wiki/Live-variable_analysis
>
> Suggested-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> ---
>  include/linux/bpf_verifier.h                  |   6 +
>  kernel/bpf/verifier.c                         | 376 ++++++++++++++++--
>  .../testing/selftests/bpf/prog_tests/align.c  |  11 +-
>  3 files changed, 369 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index bbd013c38ff9..8c23958bc471 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -591,6 +591,8 @@ struct bpf_insn_aux_data {
>          * accepts callback function as a parameter.
>          */
>         bool calls_callback;
> +       /* registers alive before this instruction. */
> +       u16 live_regs_before;
>  };
>
>  #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
> @@ -747,7 +749,11 @@ struct bpf_verifier_env {
>         struct {
>                 int *insn_state;
>                 int *insn_stack;
> +               /* vector of instruction indexes sorted in post-order */
> +               int *insn_postorder;
>                 int cur_stack;
> +               /* current position in the insn_postorder vector */
> +               int cur_postorder;
>         } cfg;
>         struct backtrack_state bt;
>         struct bpf_insn_hist_entry *insn_hist;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index dcd0da4e62fc..4ac7dc58d9b1 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3353,6 +3353,15 @@ static int add_subprog_and_kfunc(struct bpf_verifier_env *env)
>         return 0;
>  }
>
> +static int jmp_offset(struct bpf_insn *insn)
> +{
> +       u8 code = insn->code;
> +
> +       if (code == (BPF_JMP32 | BPF_JA))
> +               return insn->imm;
> +       return insn->off;
> +}
> +
>  static int check_subprogs(struct bpf_verifier_env *env)
>  {
>         int i, subprog_start, subprog_end, off, cur_subprog = 0;
> @@ -3379,10 +3388,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
>                         goto next;
>                 if (BPF_OP(code) == BPF_EXIT || BPF_OP(code) == BPF_CALL)
>                         goto next;
> -               if (code == (BPF_JMP32 | BPF_JA))
> -                       off = i + insn[i].imm + 1;
> -               else
> -                       off = i + insn[i].off + 1;
> +               off = i + jmp_offset(&insn[i]) + 1;

Nice cleanup, but pls split it into pre-patch,
so that main liveness logic is in the main patch.

>                 if (off < subprog_start || off >= subprog_end) {
>                         verbose(env, "jump out of range from insn %d to %d\n", i, off);
>                         return -EINVAL;
> @@ -3912,6 +3918,17 @@ static const char *disasm_kfunc_name(void *data, const struct bpf_insn *insn)
>         return btf_name_by_offset(desc_btf, func->name_off);
>  }
>
> +static void verbose_insn(struct bpf_verifier_env *env, struct bpf_insn *insn)
> +{
> +       const struct bpf_insn_cbs cbs = {
> +               .cb_call        = disasm_kfunc_name,
> +               .cb_print       = verbose,
> +               .private_data   = env,
> +       };
> +
> +       print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
> +}
> +
>  static inline void bt_init(struct backtrack_state *bt, u32 frame)
>  {
>         bt->frame = frame;
> @@ -4112,11 +4129,6 @@ static bool calls_callback(struct bpf_verifier_env *env, int insn_idx);
>  static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
>                           struct bpf_insn_hist_entry *hist, struct backtrack_state *bt)
>  {
> -       const struct bpf_insn_cbs cbs = {
> -               .cb_call        = disasm_kfunc_name,
> -               .cb_print       = verbose,
> -               .private_data   = env,
> -       };
>         struct bpf_insn *insn = env->prog->insnsi + idx;
>         u8 class = BPF_CLASS(insn->code);
>         u8 opcode = BPF_OP(insn->code);
> @@ -4134,7 +4146,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
>                 fmt_stack_mask(env->tmp_str_buf, TMP_STR_BUF_LEN, bt_stack_mask(bt));
>                 verbose(env, "stack=%s before ", env->tmp_str_buf);
>                 verbose(env, "%d: ", idx);
> -               print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
> +               verbose_insn(env, insn);

Same. Nice cleanup, but move it into pre-patch with all cleanups.
No need for separate patches per cleanup.
Just all cleanups into one will do.

>         }
>
>         /* If there is a history record that some registers gained range at this insn,
> @@ -11011,6 +11023,9 @@ static int get_helper_proto(struct bpf_verifier_env *env, int func_id,
>         return *ptr ? 0 : -EINVAL;
>  }
>
> +/* Bitmask with 1s for all caller saved registers */
> +#define ALL_CALLER_SAVED_REGS ((1u << CALLER_SAVED_REGS) - 1)
> +
>  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>                              int *insn_idx_p)
>  {
> @@ -17246,9 +17261,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
>  static int check_cfg(struct bpf_verifier_env *env)
>  {
>         int insn_cnt = env->prog->len;
> -       int *insn_stack, *insn_state;
> +       int *insn_stack, *insn_state, *insn_postorder;
>         int ex_insn_beg, i, ret = 0;
> -       bool ex_done = false;
>
>         insn_state = env->cfg.insn_state = kvcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
>         if (!insn_state)
> @@ -17260,6 +17274,17 @@ static int check_cfg(struct bpf_verifier_env *env)
>                 return -ENOMEM;
>         }
>
> +       insn_postorder = env->cfg.insn_postorder = kvcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
> +       if (!insn_postorder) {
> +               kvfree(insn_state);
> +               kvfree(insn_stack);
> +               return -ENOMEM;
> +       }
> +
> +       ex_insn_beg = env->exception_callback_subprog
> +                     ? env->subprog_info[env->exception_callback_subprog].start
> +                     : 0;
> +
>         insn_state[0] = DISCOVERED; /* mark 1st insn as discovered */
>         insn_stack[0] = 0; /* 0 is the first instruction */
>         env->cfg.cur_stack = 1;
> @@ -17273,6 +17298,7 @@ static int check_cfg(struct bpf_verifier_env *env)
>                 case DONE_EXPLORING:
>                         insn_state[t] = EXPLORED;
>                         env->cfg.cur_stack--;
> +                       insn_postorder[env->cfg.cur_postorder++] = t;
>                         break;
>                 case KEEP_EXPLORING:
>                         break;
> @@ -17291,13 +17317,10 @@ static int check_cfg(struct bpf_verifier_env *env)
>                 goto err_free;
>         }
>
> -       if (env->exception_callback_subprog && !ex_done) {
> -               ex_insn_beg = env->subprog_info[env->exception_callback_subprog].start;
> -
> +       if (ex_insn_beg && insn_state[ex_insn_beg] != EXPLORED) {
>                 insn_state[ex_insn_beg] = DISCOVERED;
>                 insn_stack[0] = ex_insn_beg;
>                 env->cfg.cur_stack = 1;
> -               ex_done = true;
>                 goto walk_cfg;
>         }
>
> @@ -19121,19 +19144,13 @@ static int do_check(struct bpf_verifier_env *env)
>                 }
>
>                 if (env->log.level & BPF_LOG_LEVEL) {
> -                       const struct bpf_insn_cbs cbs = {
> -                               .cb_call        = disasm_kfunc_name,
> -                               .cb_print       = verbose,
> -                               .private_data   = env,
> -                       };
> -
>                         if (verifier_state_scratched(env))
>                                 print_insn_state(env, state, state->curframe);
>
>                         verbose_linfo(env, env->insn_idx, "; ");
>                         env->prev_log_pos = env->log.end_pos;
>                         verbose(env, "%d: ", env->insn_idx);
> -                       print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
> +                       verbose_insn(env, insn);
>                         env->prev_insn_print_pos = env->log.end_pos - env->prev_log_pos;
>                         env->prev_log_pos = env->log.end_pos;
>                 }
> @@ -23199,6 +23216,312 @@ static int process_fd_array(struct bpf_verifier_env *env, union bpf_attr *attr,
>         return 0;
>  }
>
> +static bool can_fallthrough(struct bpf_insn *insn)
> +{
> +       u8 class = BPF_CLASS(insn->code);
> +       u8 opcode = BPF_OP(insn->code);
> +
> +       if (class != BPF_JMP && class != BPF_JMP32)
> +               return true;
> +
> +       if (opcode == BPF_EXIT || opcode == BPF_JA)
> +               return false;
> +
> +       return true;
> +}
> +
> +static bool can_jump(struct bpf_insn *insn)
> +{
> +       u8 class = BPF_CLASS(insn->code);
> +       u8 opcode = BPF_OP(insn->code);
> +
> +       if (class != BPF_JMP && class != BPF_JMP32)
> +               return false;
> +
> +       switch (opcode) {
> +       case BPF_JA:
> +       case BPF_JEQ:
> +       case BPF_JNE:
> +       case BPF_JLT:
> +       case BPF_JLE:
> +       case BPF_JGT:
> +       case BPF_JGE:
> +       case BPF_JSGT:
> +       case BPF_JSGE:
> +       case BPF_JSLT:
> +       case BPF_JSLE:
> +       case BPF_JCOND:
> +               return true;
> +       }
> +
> +       return false;
> +}
> +
> +static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
> +{
> +       struct bpf_insn *insn = &prog->insnsi[idx];
> +       int i = 0, insn_sz;
> +       u32 dst;
> +
> +       succ[0] = prog->len;
> +       succ[1] = prog->len;

Why initialize them? They won't be used anyway.

> +
> +       insn_sz = bpf_is_ldimm64(insn) ? 2 : 1;
> +       if (can_fallthrough(insn) && idx + 1 < prog->len)
> +               succ[i++] = idx + insn_sz;
> +
> +       if (can_jump(insn)) {
> +               dst = idx + jmp_offset(insn) + 1;
> +               if (i == 0 || succ[0] != dst)
> +                       succ[i++] = dst;
> +       }
> +
> +       return i;
> +}
> +
> +/* Each field is a register bitmask */
> +struct insn_live_regs {
> +       u16 use;        /* registers read by instruction */
> +       u16 def;        /* registers written by instruction */
> +       u16 in;         /* registers that may be alive before instruction */
> +       u16 out;        /* registers that may be alive after instruction */
> +};
> +
> +/* Compute *use and *def values for the call instruction */
> +static void compute_call_live_regs(struct bpf_verifier_env *env,
> +                                  struct bpf_insn *insn,
> +                                  u16 *use, u16 *def)
> +{
> +       struct bpf_kfunc_call_arg_meta meta;
> +       const struct bpf_func_proto *fn;
> +       int err, i, nargs;
> +
> +       *def = ALL_CALLER_SAVED_REGS;
> +       *use = *def & ~BIT(BPF_REG_0);
> +       if (bpf_helper_call(insn)) {
> +               err = get_helper_proto(env, insn->imm, &fn);
> +               if (err)
> +                       return;
> +               *use = 0;
> +               for (i = 1; i < CALLER_SAVED_REGS; i++) {
> +                       if (fn->arg_type[i - 1] == ARG_DONTCARE)
> +                               break;
> +                       *use |= BIT(i);
> +               }
> +       } else if (bpf_pseudo_kfunc_call(insn)) {
> +               err = fetch_kfunc_meta(env, insn, &meta, NULL);
> +               if (err)
> +                       return;
> +               nargs = btf_type_vlen(meta.func_proto);
> +               *use = 0;
> +               for (i = 1; i <= nargs; i++)
> +                       *use |= BIT(i);
> +       }
> +}

The helper is very close to helper_fastcall_clobber_mask()
and kfunc_fastcall_clobber_mask(),
and the beginning of mark_fastcall_pattern_for_call().
Let's generalize the code and make it a pre-patch 2.
(separate from trivial cleanups).

In generic helper I'd standardize on 'use' word instead
of 'clobber_mask'.

The move of ALL_CALLER_SAVED_REGS can be there as well.

> +
> +/* Compute info->{use,def} fields for the instruction */
> +static void compute_insn_live_regs(struct bpf_verifier_env *env,
> +                                  struct bpf_insn *insn,
> +                                  struct insn_live_regs *info)
> +{
> +       u8 class = BPF_CLASS(insn->code);
> +       u8 code = BPF_OP(insn->code);
> +       u8 mode = BPF_MODE(insn->code);
> +       u16 src = BIT(insn->src_reg);
> +       u16 dst = BIT(insn->dst_reg);
> +       u16 r0  = BIT(0);
> +       u16 def = 0;
> +       u16 use = 0xffff;
> +
> +       switch (class) {
> +       case BPF_LD:
> +               switch (mode) {
> +               case BPF_IMM:
> +                       if (BPF_SIZE(insn->code) == BPF_DW) {
> +                               def = dst;
> +                               use = 0;
> +                       }
> +                       break;
> +               case BPF_LD | BPF_ABS:
> +               case BPF_LD | BPF_IND:
> +                       /* stick with defaults */
> +                       break;
> +               }
> +               break;
> +       case BPF_LDX:
> +               switch (mode) {
> +               case BPF_MEM:
> +               case BPF_MEMSX:
> +                       def = dst;
> +                       use = src;
> +                       break;
> +               }
> +               break;
> +       case BPF_ST:
> +               switch (mode) {
> +               case BPF_MEM:
> +                       def = 0;
> +                       use = dst;
> +                       break;
> +               }
> +               break;
> +       case BPF_STX:
> +               switch (mode) {
> +               case BPF_MEM:
> +                       def = 0;
> +                       use = dst | src;
> +                       break;
> +               case BPF_ATOMIC:
> +                       use = dst | src;
> +                       if (insn->imm & BPF_FETCH) {
> +                               if (insn->imm == BPF_CMPXCHG)
> +                                       def = r0;
> +                               else
> +                                       def = src;
> +                       } else {
> +                               def = 0;
> +                       }
> +                       break;
> +               }
> +               break;
> +       case BPF_ALU:
> +       case BPF_ALU64:
> +               switch (code) {
> +               case BPF_END:
> +                       use = dst;
> +                       def = dst;
> +                       break;
> +               case BPF_MOV:
> +                       def = dst;
> +                       if (BPF_SRC(insn->code) == BPF_K)
> +                               use = 0;
> +                       else
> +                               use = src;
> +                       break;
> +               default:
> +                       def = dst;
> +                       if (BPF_SRC(insn->code) == BPF_K)
> +                               use = dst;
> +                       else
> +                               use = dst | src;
> +               }
> +               break;
> +       case BPF_JMP:
> +       case BPF_JMP32:
> +               switch (code) {
> +               case BPF_JA:
> +                       def = 0;
> +                       use = 0;
> +                       break;
> +               case BPF_EXIT:
> +                       def = 0;
> +                       use = r0;
> +                       break;
> +               case BPF_CALL:
> +                       compute_call_live_regs(env, insn, &use, &def);
> +                       break;
> +               default:
> +                       def = 0;
> +                       if (BPF_SRC(insn->code) == BPF_K)
> +                               use = dst;
> +                       else
> +                               use = dst | src;
> +               }
> +               break;
> +       }
> +
> +       info->def = def;
> +       info->use = use;
> +}
> +
> +/* Compute may-live registers before each instruction in the program.
> + * A register is live before instruction I if it is read by I or by some
> + * instruction S that follows I, provided it is not overwritten between I
> + * and S.
> + *
> + * Store result in env->insn_aux_data[i].live_regs.
> + */
> +static int compute_live_registers(struct bpf_verifier_env *env)
> +{
> +       struct bpf_insn_aux_data *insn_aux = env->insn_aux_data;
> +       struct bpf_insn *insns = env->prog->insnsi;
> +       struct insn_live_regs *state;
> +       int insn_cnt = env->prog->len;
> +       int err = 0, i, j;
> +       bool changed;
> +
> +       /* Use simple algorithm desribed in:

described

> +        * https://en.wikipedia.org/wiki/Live-variable_analysis

instead of the link let's copy paste definition of
I.use|def from commit log into this comment.

> +        *
> +        * - visit each instruction in a postorder and update
> +        *   state[i].in, state[i].out as follows:
> +        *
> +        *       state[i].out = U [state[s].in for S in insn_successors(i)]
> +        *       state[i].in  = (state[i].out / state[i].def) U state[i].use
> +        *
> +        *   (where U stands for set union, / stands for set difference)
> +        * - repeat the computation while {in,out} fields change for
> +        *   any instruction.
> +        */
> +       state = kvcalloc(insn_cnt, sizeof(*state), GFP_KERNEL);
> +       if (!state) {
> +               err = -ENOMEM;
> +               goto out;
> +       }
> +
> +       for (i = 0; i < insn_cnt; ++i)
> +               compute_insn_live_regs(env, &insns[i], &state[i]);
> +
> +       changed = true;
> +       while (changed) {
> +               changed = false;
> +               for (i = 0; i < env->cfg.cur_postorder; ++i) {
> +                       int insn_idx = env->cfg.insn_postorder[i];
> +                       struct insn_live_regs *live = &state[insn_idx];
> +                       int succ_num;
> +                       u32 succ[2];
> +                       u16 new_out = 0;
> +                       u16 new_in = 0;
> +
> +                       succ_num = insn_successors(env->prog, insn_idx, succ);
> +                       for (int s = 0; s < succ_num; ++s)
> +                               new_out |= state[succ[s]].in;
> +                       new_in = (new_out & ~live->def) | live->use;
> +                       if (new_out != live->out || new_in != live->in) {
> +                               live->in = new_in;
> +                               live->out = new_out;
> +                               changed = true;
> +                       }
> +               }
> +       }
> +
> +       for (i = 0; i < insn_cnt; ++i)
> +               insn_aux[i].live_regs_before = state[i].in;
> +
> +       if (env->log.level & BPF_LOG_LEVEL2) {
> +               verbose(env, "Live regs before insn:\n");
> +               for (i = 0; i < insn_cnt; ++i) {
> +                       verbose(env, "%3d: ", i);
> +                       for (j = BPF_REG_0; j < BPF_REG_10; ++j)
> +                               if (insn_aux[i].live_regs_before & BIT(j))
> +                                       verbose(env, "%d", j);
> +                               else
> +                                       verbose(env, ".");

I wonder whether we would need to see liveness at log_level=2
during the normal verifier output instead of once in a beginning.
I guess it's fine doing it once like you did.

> +                       verbose(env, " ");
> +                       verbose_insn(env, &insns[i]);
> +                       if (bpf_is_ldimm64(&insns[i]))
> +                               i++;
> +               }
> +       }
> +
> +out:
> +       kvfree(state);
> +       kvfree(env->cfg.insn_postorder);
> +       env->cfg.insn_postorder = NULL;
> +       env->cfg.cur_postorder = 0;
> +       return err;
> +}
> +
>  int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
>  {
>         u64 start_time = ktime_get_ns();
> @@ -23320,6 +23643,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
>         if (ret)
>                 goto skip_full_check;
>
> +       if (is_priv) {
> +               ret = compute_live_registers(env);
> +               if (ret < 0)
> +                       goto skip_full_check;
> +       }
> +
>         ret = mark_fastcall_patterns(env);
>         if (ret < 0)
>                 goto skip_full_check;
> @@ -23458,6 +23787,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
>         vfree(env->insn_aux_data);
>         kvfree(env->insn_hist);
>  err_free_env:
> +       kvfree(env->cfg.insn_postorder);
>         kvfree(env);
>         return ret;
>  }
> diff --git a/tools/testing/selftests/bpf/prog_tests/align.c b/tools/testing/selftests/bpf/prog_tests/align.c
> index 4ebd0da898f5..1d53a8561ee2 100644
> --- a/tools/testing/selftests/bpf/prog_tests/align.c
> +++ b/tools/testing/selftests/bpf/prog_tests/align.c
> @@ -610,9 +610,11 @@ static int do_test_single(struct bpf_align_test *test)
>                 .log_size = sizeof(bpf_vlog),
>                 .log_level = 2,
>         );
> +       const char *main_pass_start = "0: R1=ctx() R10=fp0";
>         const char *line_ptr;
>         int cur_line = -1;
>         int prog_len, i;
> +       char *start;
>         int fd_prog;
>         int ret;
>
> @@ -632,7 +634,13 @@ static int do_test_single(struct bpf_align_test *test)
>                 ret = 0;
>                 /* We make a local copy so that we can strtok() it */
>                 strncpy(bpf_vlog_copy, bpf_vlog, sizeof(bpf_vlog_copy));
> -               line_ptr = strtok(bpf_vlog_copy, "\n");
> +               start = strstr(bpf_vlog_copy, main_pass_start);
> +               if (!start) {
> +                       ret = 1;
> +                       printf("Can't find initial line '%s'\n", main_pass_start);
> +                       goto out;
> +               }
> +               line_ptr = strtok(start, "\n");
>                 for (i = 0; i < MAX_MATCHES; i++) {
>                         struct bpf_reg_match m = test->matches[i];
>                         const char *p;
> @@ -682,6 +690,7 @@ static int do_test_single(struct bpf_align_test *test)
>                                 break;
>                         }
>                 }
> +out:
>                 if (fd_prog >= 0)
>                         close(fd_prog);
>         }
> --
> 2.48.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v1 1/3] bpf: simple DFA-based live registers analysis
  2025-03-01  2:01   ` Alexei Starovoitov
@ 2025-03-01  2:09     ` Eduard Zingerman
  0 siblings, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2025-03-01  2:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Yonghong Song, Tejun Heo

On Fri, 2025-02-28 at 18:01 -0800, Alexei Starovoitov wrote:
> On Thu, Feb 27, 2025 at 10:01 PM Eduard Zingerman <eddyz87@gmail.com> wrote:

[...]

> > @@ -3379,10 +3388,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
> >                         goto next;
> >                 if (BPF_OP(code) == BPF_EXIT || BPF_OP(code) == BPF_CALL)
> >                         goto next;
> > -               if (code == (BPF_JMP32 | BPF_JA))
> > -                       off = i + insn[i].imm + 1;
> > -               else
> > -                       off = i + insn[i].off + 1;
> > +               off = i + jmp_offset(&insn[i]) + 1;
> 
> Nice cleanup, but pls split it into pre-patch,
> so that main liveness logic is in the main patch.

Thanks for the feedback.
I'll include all requested changes in v2.

[...]


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
  2025-02-28  6:00 [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Eduard Zingerman
                   ` (2 preceding siblings ...)
  2025-02-28  6:00 ` [PATCH bpf-next v1 3/3] selftests/bpf: test cases for compute_live_registers() Eduard Zingerman
@ 2025-03-01  2:10 ` Alexei Starovoitov
  2025-03-01  4:40   ` Eduard Zingerman
  3 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2025-03-01  2:10 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Yonghong Song, Tejun Heo

On Thu, Feb 27, 2025 at 10:00 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> This patch set introduces a simple live register DFA analysis.
> The analysis is performed as a separate step before the main
> verification pass, and results are stored in `env->insn_aux_data` for
> each instruction.
>
> This change improves handling of iterator/callback-based loops, as
> regular register liveness marks are not finalized while loops are
> being processed. See veristat results for selftests and sched_ext in
> patch #2.
>
> The patch set was tested in branch [1] by disabling the current
> register parentage chain liveness computation, using DFA-based
> liveness for registers while assuming all stack slots as live.
> No notable regressions were found in test_progs-based tests.

I think the end goal is to get rid of mark_reg_read() and
switch to proper live reg analysis.
So please include the numbers to see how much work left.

Also note that mark_reg_read() tracks 32 vs 64 reads separately.
iirc we did it to support fine grain mark_insn_zext
to help architectures where zext has to be inserted by JIT.
I'm not sure whether new liveness has to do it as well.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
  2025-03-01  2:10 ` [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Alexei Starovoitov
@ 2025-03-01  4:40   ` Eduard Zingerman
  2025-03-02  0:09     ` Alexei Starovoitov
  0 siblings, 1 reply; 11+ messages in thread
From: Eduard Zingerman @ 2025-03-01  4:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Yonghong Song, Tejun Heo

On Fri, 2025-02-28 at 18:10 -0800, Alexei Starovoitov wrote:

[...]

> I think the end goal is to get rid of mark_reg_read() and
> switch to proper live reg analysis.
> So please include the numbers to see how much work left.

Complete removal of mark_reg_read() means that analysis needs to be
done for stack slots as well. The algorithm to handle stack slots is
much more complicated:
- it needs to track register / stack slot type to handle cases like
  "r1 = r10" and spills of the stack pointer to stack;
- it needs to track register values, at-least crudely, to handle cases
  like "r1 = r10; r1 += r2;" (array access).

The worst case scenario, as you suggested, is just to assume stack
slots live, but it is a big verification performance hit.
Exact numbers are at the end of the email.

> Also note that mark_reg_read() tracks 32 vs 64 reads separately.
> iirc we did it to support fine grain mark_insn_zext
> to help architectures where zext has to be inserted by JIT.
> I'm not sure whether new liveness has to do it as well.

As far as I understand, this is important for one check in
propagate_liveness(). And that check means something like:
"if this register was read as 64-bit value, remember that
 it needs zero extension on 32-bit load".

Meaning that either DFA would need to track this bit of information
(should be simple), or more zero extensions would be added.

---

Repository [1] shared in cover letter was used for benchmarks below.
Abbreviations are as follows:
- Name: dfa-opts
  Commit: b73005452a4a
  Meaning: DFA as shared in this patch-set + a set of small
           improvements which I decided to exclude from the
           patch-set as described in the cover letter.
- Name: dfa-opts-no-rm
  Commit: e486757fdada
  Meaning: dfa-opts + read marks are disabled for registers.
- Name: dfa-opts-no-rm-sl
  Commit: a9930e8127a9
  Meaning: dfa-opts + read marks are disabled for registers
           and stack.

[1] https://github.com/eddyz87/bpf/tree/liveregs-dfa-std-liveregs-off

Veristat output is filtered using -f "states_pct>5" -f "!insns<200".
Veristat results are followed by a histogram that accounts for all
tests.

Two comparisons are made:
- dfa-opts vs dfa-opts-no-rm (small negative impact, except two
  sched_ext programs that hit 1M instructions limit; positive impact
  would have indicated a bug);
- dfa-opts vs dfa-opts-no-rm-sl (big negative impact).

========= selftests: dfa-opts vs dfa-opts-no-rm =========

File                      Program           States (A)  States (B)  States (DIFF)
------------------------  ----------------  ----------  ----------  -------------
test_l4lb_noinline.bpf.o  balancer_ingress         219         231   +12 (+5.48%)

Total progs: 3565
Old success: 2054
New success: 2054
States diff min:    0.00%
States diff max:    5.48%
   0% ..    5%: 3564
   5% ..   10%: 1

========= scx: dfa-opts vs dfa-opts-no-rm =========

File       Program          States (A)  States (B)  States      (DIFF)
---------  ---------------  ----------  ----------  ------------------
bpf.bpf.o  rusty_init             1944       55004  +53060 (+2729.42%)
bpf.bpf.o  rusty_init_task        1732       55049  +53317 (+3078.35%)

Total progs: 216
Old success: 186
New success: 184
States diff min:    0.00%
States diff max: 3078.35%
   0% ..    5%: 214
2725% .. 3080%: 2



========= selftests: dfa-opts vs dfa-opts-no-rm-sl =========

File                              Program                               States (A)  States (B)  States     (DIFF)
--------------------------------  ------------------------------------  ----------  ----------  -----------------
arena_htab_asm.bpf.o              arena_htab_asm                                33          40       +7 (+21.21%)
bpf_cubic.bpf.o                   bpf_cubic_cong_avoid                          92          98        +6 (+6.52%)
bpf_flow.bpf.o                    flow_dissector_0                              66         125      +59 (+89.39%)
bpf_iter_ksym.bpf.o               dump_ksym                                     16          21       +5 (+31.25%)
profiler1.bpf.o                   kprobe__proc_sys_write                        84         140      +56 (+66.67%)
profiler1.bpf.o                   kprobe__vfs_link                             504         543       +39 (+7.74%)
profiler1.bpf.o                   kprobe__vfs_symlink                          238         466     +228 (+95.80%)
profiler1.bpf.o                   kprobe_ret__do_filp_open                     247         274      +27 (+10.93%)
profiler1.bpf.o                   raw_tracepoint__sched_process_exec           139         350    +211 (+151.80%)
profiler1.bpf.o                   raw_tracepoint__sched_process_exit            67          86      +19 (+28.36%)
profiler1.bpf.o                   tracepoint__syscalls__sys_enter_kill         649         758     +109 (+16.80%)
profiler2.bpf.o                   kprobe__vfs_link                             149         257     +108 (+72.48%)
profiler2.bpf.o                   kprobe_ret__do_filp_open                     106         120      +14 (+13.21%)
profiler2.bpf.o                   raw_tracepoint__sched_process_exec           126         140      +14 (+11.11%)
profiler3.bpf.o                   kprobe__vfs_link                             805        1182     +377 (+46.83%)
pyperf180.bpf.o                   on_event                                   10564       17659    +7095 (+67.16%)
pyperf50.bpf.o                    on_event                                    2489        3375     +886 (+35.60%)
pyperf600_iter.bpf.o              on_event                                     192         214      +22 (+11.46%)
pyperf_subprogs.bpf.o             on_event                                    2331        2514      +183 (+7.85%)
setget_sockopt.bpf.o              skops_sockopt                                429         458       +29 (+6.76%)
setget_sockopt.bpf.o              socket_post_create                            90          95        +5 (+5.56%)
sock_iter_batch.bpf.o             iter_tcp_soreuse                               3           5       +2 (+66.67%)
strobemeta_bpf_loop.bpf.o         on_event                                     209         331     +122 (+58.37%)
test_bpf_nf.bpf.o                 nf_skb_ct_test                                41          56      +15 (+36.59%)
test_bpf_nf.bpf.o                 nf_xdp_ct_test                                41          56      +15 (+36.59%)
test_cls_redirect.bpf.o           cls_redirect                                2175       14083  +11908 (+547.49%)
test_cls_redirect_dynptr.bpf.o    cls_redirect                                 220         327     +107 (+48.64%)
test_cls_redirect_subprogs.bpf.o  cls_redirect                                4390       17001  +12611 (+287.27%)
test_l4lb.bpf.o                   balancer_ingress                             137         256     +119 (+86.86%)
test_l4lb_noinline.bpf.o          balancer_ingress                             219         643    +424 (+193.61%)
test_l4lb_noinline_dynptr.bpf.o   balancer_ingress                              73         182    +109 (+149.32%)
test_misc_tcp_hdr_options.bpf.o   misc_estab                                    88          98      +10 (+11.36%)
test_pkt_access.bpf.o             test_pkt_access                               21          25       +4 (+19.05%)
test_sock_fields.bpf.o            egress_read_sock_fields                       20          29       +9 (+45.00%)
test_tc_neigh_fib.bpf.o           tc_dst                                        12          14       +2 (+16.67%)
test_tc_neigh_fib.bpf.o           tc_src                                        12          14       +2 (+16.67%)
test_tcp_custom_syncookie.bpf.o   tcp_custom_syncookie                         420         560     +140 (+33.33%)
test_tcp_hdr_options.bpf.o        estab                                        189         225      +36 (+19.05%)
test_xdp.bpf.o                    _xdp_tx_iptunnel                              17          18        +1 (+5.88%)
test_xdp_dynptr.bpf.o             _xdp_tx_iptunnel                              26          36      +10 (+38.46%)
test_xdp_loop.bpf.o               _xdp_tx_iptunnel                              19          20        +1 (+5.26%)
test_xdp_noinline.bpf.o           balancer_ingress_v4                          271        1080    +809 (+298.52%)
test_xdp_noinline.bpf.o           balancer_ingress_v6                          268        1030    +762 (+284.33%)
xdp_features.bpf.o                xdp_do_tx                                     10          13       +3 (+30.00%)
xdp_synproxy_kern.bpf.o           syncookie_tc                                 390         467      +77 (+19.74%)
xdp_synproxy_kern.bpf.o           syncookie_xdp                                384         450      +66 (+17.19%)

Total progs: 3565
Old success: 2054
New success: 2054
States diff min:   -9.09%
States diff max:  547.49%
 -10% ..    0%: 3
   0% ..    5%: 3492
   5% ..   10%: 10
  10% ..   15%: 8
  15% ..   20%: 10
  20% ..   25%: 6
  25% ..   35%: 8
  35% ..   40%: 4
  45% ..   50%: 3
  50% ..   55%: 4
  55% ..   70%: 4
  70% ..   90%: 3
  95% ..  105%: 3
 145% ..  195%: 3
 280% ..  300%: 3
 545% ..  550%: 1

========= scx: dfa-opts vs dfa-opts-no-rm-sl =========

File            Program             States (A)  States (B)  States      (DIFF)
--------------  ------------------  ----------  ----------  ------------------
bpf.bpf.o       bpfland_enqueue             18          20        +2 (+11.11%)
bpf.bpf.o       bpfland_select_cpu          83         103       +20 (+24.10%)
bpf.bpf.o       flash_select_cpu            30          49       +19 (+63.33%)
bpf.bpf.o       lavd_cpu_offline           303         360       +57 (+18.81%)
bpf.bpf.o       lavd_cpu_online            303         360       +57 (+18.81%)
bpf.bpf.o       lavd_dispatch             7065       10652     +3587 (+50.77%)
bpf.bpf.o       lavd_init                  480         554       +74 (+15.42%)
bpf.bpf.o       lavd_running                89          94         +5 (+5.62%)
bpf.bpf.o       lavd_select_cpu            451         483        +32 (+7.10%)
bpf.bpf.o       layered_dispatch           501         950      +449 (+89.62%)
bpf.bpf.o       layered_dump               237         258        +21 (+8.86%)
bpf.bpf.o       layered_enqueue           1290        1655      +365 (+28.29%)
bpf.bpf.o       layered_init               423         552      +129 (+30.50%)
bpf.bpf.o       layered_select_cpu         201         311      +110 (+54.73%)
bpf.bpf.o       p2dq_dispatch               53         116      +63 (+118.87%)
bpf.bpf.o       rusty_init                1944       55006  +53062 (+2729.53%)
bpf.bpf.o       rusty_init_task           1732       55052  +53320 (+3078.52%)
bpf.bpf.o       rusty_running               19          23        +4 (+21.05%)
bpf.bpf.o       rusty_select_cpu           108         227     +119 (+110.19%)
bpf.bpf.o       rusty_set_cpumask          313         479      +166 (+53.04%)
scx_nest.bpf.o  nest_select_cpu             49          53         +4 (+8.16%)

Total progs: 216
Old success: 186
New success: 184
States diff min:    0.00%
States diff max: 3078.52%
   0% ..    5%: 186
   5% ..   10%: 4
  10% ..   15%: 5
  15% ..   20%: 6
  20% ..   25%: 3
  25% ..   55%: 6
  60% ..  115%: 3
 115% .. 3080%: 3


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
  2025-03-01  4:40   ` Eduard Zingerman
@ 2025-03-02  0:09     ` Alexei Starovoitov
  2025-03-03 19:28       ` Eduard Zingerman
  2025-03-05  9:00       ` Eduard Zingerman
  0 siblings, 2 replies; 11+ messages in thread
From: Alexei Starovoitov @ 2025-03-02  0:09 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Yonghong Song, Tejun Heo

On Fri, Feb 28, 2025 at 8:40 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2025-02-28 at 18:10 -0800, Alexei Starovoitov wrote:
>
> [...]
>
> > I think the end goal is to get rid of mark_reg_read() and
> > switch to proper live reg analysis.
> > So please include the numbers to see how much work left.
>
> Complete removal of mark_reg_read() means that analysis needs to be
> done for stack slots as well. The algorithm to handle stack slots is
> much more complicated:
> - it needs to track register / stack slot type to handle cases like
>   "r1 = r10" and spills of the stack pointer to stack;
> - it needs to track register values, at-least crudely, to handle cases
>   like "r1 = r10; r1 += r2;" (array access).

Doing this kind of register movement tracking before do_check()
may be difficult indeed.
Can we do this use/def tracking inline similar to current liveness,
but without ->parent logic.
Using postorder array that this patch adds ?

verifier states are path sensitive and more accurate
while this one will be insn based, but maybe good enough ?

> The worst case scenario, as you suggested, is just to assume stack
> slots live, but it is a big verification performance hit.
> Exact numbers are at the end of the email.
>
> > Also note that mark_reg_read() tracks 32 vs 64 reads separately.
> > iirc we did it to support fine grain mark_insn_zext
> > to help architectures where zext has to be inserted by JIT.
> > I'm not sure whether new liveness has to do it as well.
>
> As far as I understand, this is important for one check in
> propagate_liveness(). And that check means something like:
> "if this register was read as 64-bit value, remember that
>  it needs zero extension on 32-bit load".
>
> Meaning that either DFA would need to track this bit of information
> (should be simple), or more zero extensions would be added.

Right. New liveness doesn't break zext, but makes it worse
for arch that needs it. We would need to quantify the impact.
iirc it was noticeable enough that we added this support.

>
> ---
>
> Repository [1] shared in cover letter was used for benchmarks below.
> Abbreviations are as follows:
> - Name: dfa-opts
>   Commit: b73005452a4a
>   Meaning: DFA as shared in this patch-set + a set of small
>            improvements which I decided to exclude from the
>            patch-set as described in the cover letter.
> - Name: dfa-opts-no-rm
>   Commit: e486757fdada
>   Meaning: dfa-opts + read marks are disabled for registers.
> - Name: dfa-opts-no-rm-sl
>   Commit: a9930e8127a9
>   Meaning: dfa-opts + read marks are disabled for registers
>            and stack.
>
> [1] https://github.com/eddyz87/bpf/tree/liveregs-dfa-std-liveregs-off
>
> Veristat output is filtered using -f "states_pct>5" -f "!insns<200".
> Veristat results are followed by a histogram that accounts for all
> tests.
>
> Two comparisons are made:
> - dfa-opts vs dfa-opts-no-rm (small negative impact, except two
>   sched_ext programs that hit 1M instructions limit; positive impact
>   would have indicated a bug);

Let's figure out what is causing rusty_init[_task]
to explode.
And proceed with this set in parallel.

> - dfa-opts vs dfa-opts-no-rm-sl (big negative impact).

I don't read it as a big negative.
cls_redirect and balancer_ingress need to be understood,
but nothing exploded to hit 1M insns,
so hopefully bare minimum stack tracking would do the trick.

So in terms of priorities, let's land this set, then
figure out rusty_init,
figure out read32 vs 64 for zext,
at that time we may land -no-rm.
Then stack tracking.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
  2025-03-02  0:09     ` Alexei Starovoitov
@ 2025-03-03 19:28       ` Eduard Zingerman
  2025-03-05  9:00       ` Eduard Zingerman
  1 sibling, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2025-03-03 19:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Yonghong Song, Tejun Heo

On Sat, 2025-03-01 at 16:09 -0800, Alexei Starovoitov wrote:
> On Fri, Feb 28, 2025 at 8:40 PM Eduard Zingerman <eddyz87@gmail.com> wrote:

[...]

> > Complete removal of mark_reg_read() means that analysis needs to be
> > done for stack slots as well. The algorithm to handle stack slots is
> > much more complicated:
> > - it needs to track register / stack slot type to handle cases like
> >   "r1 = r10" and spills of the stack pointer to stack;
> > - it needs to track register values, at-least crudely, to handle cases
> >   like "r1 = r10; r1 += r2;" (array access).
> 
> Doing this kind of register movement tracking before do_check()
> may be difficult indeed.
> Can we do this use/def tracking inline similar to current liveness,
> but without ->parent logic.
> Using postorder array that this patch adds ?
> verifier states are path sensitive and more accurate
> while this one will be insn based, but maybe good enough ?

You mean act like precision tracking? Whenever instruction is verified
and use is recorded propagate this use upwards in execution path,
updating live-in/live-out sets for instructions?

The problem here is termination (when to consider live-in/live-out
sets final). DFA computation stops as soon as live-in/live-out marks
stop changing. Idk how this condition should look for the scheme
above.

[...]

> > > Also note that mark_reg_read() tracks 32 vs 64 reads separately.
> > > iirc we did it to support fine grain mark_insn_zext
> > > to help architectures where zext has to be inserted by JIT.
> > > I'm not sure whether new liveness has to do it as well.
> > 
> > As far as I understand, this is important for one check in
> > propagate_liveness(). And that check means something like:
> > "if this register was read as 64-bit value, remember that
> >  it needs zero extension on 32-bit load".
> > 
> > Meaning that either DFA would need to track this bit of information
> > (should be simple), or more zero extensions would be added.
> 
> Right. New liveness doesn't break zext, but makes it worse
> for arch that needs it. We would need to quantify the impact.
> iirc it was noticeable enough that we added this support.

I'm surprised that no test_progs or test_verifier tests a broken.
Agree that this needs to be quantified.

[...]

> > Two comparisons are made:
> > - dfa-opts vs dfa-opts-no-rm (small negative impact, except two
> >   sched_ext programs that hit 1M instructions limit; positive impact
> >   would have indicated a bug);
> 
> Let's figure out what is causing rusty_init[_task]
> to explode.
> And proceed with this set in parallel.

Will do.

> > - dfa-opts vs dfa-opts-no-rm-sl (big negative impact).
> 
> I don't read it as a big negative.
> cls_redirect and balancer_ingress need to be understood,
> but nothing exploded to hit 1M insns,
> so hopefully bare minimum stack tracking would do the trick.
> 
> So in terms of priorities, let's land this set, then
> figure out rusty_init,
> figure out read32 vs 64 for zext,
> at that time we may land -no-rm.
> Then stack tracking.

Tbh, I think that landing dfa-opts-no-rm separately from
dfa-opts-no-rm-sl doesn't make things much simpler.
The register chain based liveness computation would still be a thing.
I'd try to figure out how to make the dfa-opts-no-rm-sl variant faster
first.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
  2025-03-02  0:09     ` Alexei Starovoitov
  2025-03-03 19:28       ` Eduard Zingerman
@ 2025-03-05  9:00       ` Eduard Zingerman
  1 sibling, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2025-03-05  9:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Kernel Team, Yonghong Song, Tejun Heo

On Sat, 2025-03-01 at 16:09 -0800, Alexei Starovoitov wrote:
> On Fri, Feb 28, 2025 at 8:40 PM Eduard Zingerman <eddyz87@gmail.com> wrote:

[...]

> > Two comparisons are made:
> > - dfa-opts vs dfa-opts-no-rm (small negative impact, except two
> >   sched_ext programs that hit 1M instructions limit; positive impact
> >   would have indicated a bug);
> 
> Let's figure out what is causing rusty_init[_task]
> to explode.
> And proceed with this set in parallel.

The regression for rusty_init was caused by incorrect mark of "r0" as
used because of "may_goto" instruction. This is fixed by:
https://lore.kernel.org/bpf/20250305085436.2731464-1-eddyz87@gmail.com/

> > - dfa-opts vs dfa-opts-no-rm-sl (big negative impact).
> 
> I don't read it as a big negative.
> cls_redirect and balancer_ingress need to be understood,
> but nothing exploded to hit 1M insns,
> so hopefully bare minimum stack tracking would do the trick.
> 
> So in terms of priorities, let's land this set, then
> figure out rusty_init,
> figure out read32 vs 64 for zext,
> at that time we may land -no-rm.
> Then stack tracking.



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-03-05  9:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-28  6:00 [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Eduard Zingerman
2025-02-28  6:00 ` [PATCH bpf-next v1 1/3] " Eduard Zingerman
2025-03-01  2:01   ` Alexei Starovoitov
2025-03-01  2:09     ` Eduard Zingerman
2025-02-28  6:00 ` [PATCH bpf-next v1 2/3] bpf: use register liveness information for func_states_equal Eduard Zingerman
2025-02-28  6:00 ` [PATCH bpf-next v1 3/3] selftests/bpf: test cases for compute_live_registers() Eduard Zingerman
2025-03-01  2:10 ` [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Alexei Starovoitov
2025-03-01  4:40   ` Eduard Zingerman
2025-03-02  0:09     ` Alexei Starovoitov
2025-03-03 19:28       ` Eduard Zingerman
2025-03-05  9:00       ` Eduard Zingerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox