* [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
@ 2025-02-28 6:00 Eduard Zingerman
2025-02-28 6:00 ` [PATCH bpf-next v1 1/3] " Eduard Zingerman
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Eduard Zingerman @ 2025-02-28 6:00 UTC (permalink / raw)
To: bpf, ast
Cc: andrii, daniel, martin.lau, kernel-team, yonghong.song, tj,
Eduard Zingerman
This patch set introduces a simple live register DFA analysis.
The analysis is performed as a separate step before the main
verification pass, and results are stored in `env->insn_aux_data` for
each instruction.
This change improves handling of iterator/callback-based loops, as
regular register liveness marks are not finalized while loops are
being processed. See veristat results for selftests and sched_ext in
patch #2.
The patch set was tested in branch [1] by disabling the current
register parentage chain liveness computation, using DFA-based
liveness for registers while assuming all stack slots as live.
No notable regressions were found in test_progs-based tests.
Note: For regular subprogram calls, the analysis conservatively
assumes that registers r1-r5 are used and that r0 is used at each
`exit` instruction. Experiments in [1] show that adding precise
handling for these cases has no impact on verification performance for
selftests and sched_ext.
This was previously shared as RFC [2].
Changes since RFC:
- The parameter count for helpers and kfuncs is now taken into
account.
- The analysis is now enabled only in privileged mode (Alexei);
- The `copy_verifier_state()` bug fix was merged separately and is no
longer a part of this patch set.
[1] https://github.com/eddyz87/bpf/tree/liveregs-dfa-std-liveregs-off
[2] https://lore.kernel.org/bpf/20250122120442.3536298-1-eddyz87@gmail.com/
Eduard Zingerman (3):
bpf: simple DFA-based live registers analysis
bpf: use register liveness information for func_states_equal
selftests/bpf: test cases for compute_live_registers()
include/linux/bpf_verifier.h | 7 +
kernel/bpf/verifier.c | 394 +++++++++++++++--
.../testing/selftests/bpf/prog_tests/align.c | 11 +-
.../bpf/prog_tests/compute_live_registers.c | 9 +
tools/testing/selftests/bpf/progs/bpf_misc.h | 12 +
.../bpf/progs/compute_live_registers.c | 397 ++++++++++++++++++
.../selftests/bpf/progs/verifier_gotol.c | 6 +-
.../bpf/progs/verifier_iterating_callbacks.c | 6 +-
8 files changed, 804 insertions(+), 38 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/compute_live_registers.c
create mode 100644 tools/testing/selftests/bpf/progs/compute_live_registers.c
--
2.48.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH bpf-next v1 1/3] bpf: simple DFA-based live registers analysis
2025-02-28 6:00 [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Eduard Zingerman
@ 2025-02-28 6:00 ` Eduard Zingerman
2025-03-01 2:01 ` Alexei Starovoitov
2025-02-28 6:00 ` [PATCH bpf-next v1 2/3] bpf: use register liveness information for func_states_equal Eduard Zingerman
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Eduard Zingerman @ 2025-02-28 6:00 UTC (permalink / raw)
To: bpf, ast
Cc: andrii, daniel, martin.lau, kernel-team, yonghong.song, tj,
Eduard Zingerman
Compute may-live registers before each instruction in the program.
A register is live before instruction I if it is read by I or by some
instruction S that follows I, provided it is not overwritten between I
and S.
This information will be used in the next patch as a hint in
func_states_equal().
Use a simple algorithm described in [1] to compute this information:
- define the following:
- I.use : the set of all registers read by instruction I;
- I.def : the set of all registers written by instruction I;
- I.in : the set of all registers that may be alive before I
execution;
- I.out : the set of all registers that may be alive after I
execution;
- I.successors : the set of instructions S that might immediately
follow I for some program execution;
- associate separate empty sets 'I.in' and 'I.out' with each instruction;
- visit each instruction in a postorder and update the corresponding
'I.in' and 'I.out' sets as follows:
I.out = U [S.in for S in I.successors]
I.in = (I.out / I.def) U I.use
(where U stands for set union, / stands for set difference)
- repeat the computation while I.{in,out} changes for any instruction.
On implementation side keep things as simple, as possible:
- check_cfg() already marks instructions EXPLORED in post-order,
modify it to save the index of each EXPLORED instruction in a vector;
- represent I.{in,out,use,def} as bitmasks;
- don't split the program into basic blocks and don't maintain the
work queue, instead:
- perform fixed-point computation by visiting each instruction;
- maintain a simple 'changed' flag to track if I.{in,out} changes
for any instruction;
Measurements show that even this simplistic implementation does not
add measurable verification time overhead (at least for selftests).
Note on check_cfg() ex_insn_beg/ex_done change:
To avoid out of bounds access to env->cfg.insn_postorder array,
it must be guaranteed that an instruction transitions to the EXPLORED
state only once. Previously, this was not the case for incorrect
programs with direct calls to exception callbacks.
The 'align' selftest needs adjustment to skip the computed
instruction/live registers printout. Otherwise, it matches lines from
this printout instead of verification log.
[1] https://en.wikipedia.org/wiki/Live-variable_analysis
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
include/linux/bpf_verifier.h | 6 +
kernel/bpf/verifier.c | 376 ++++++++++++++++--
.../testing/selftests/bpf/prog_tests/align.c | 11 +-
3 files changed, 369 insertions(+), 24 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index bbd013c38ff9..8c23958bc471 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -591,6 +591,8 @@ struct bpf_insn_aux_data {
* accepts callback function as a parameter.
*/
bool calls_callback;
+ /* registers alive before this instruction. */
+ u16 live_regs_before;
};
#define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
@@ -747,7 +749,11 @@ struct bpf_verifier_env {
struct {
int *insn_state;
int *insn_stack;
+ /* vector of instruction indexes sorted in post-order */
+ int *insn_postorder;
int cur_stack;
+ /* current position in the insn_postorder vector */
+ int cur_postorder;
} cfg;
struct backtrack_state bt;
struct bpf_insn_hist_entry *insn_hist;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index dcd0da4e62fc..4ac7dc58d9b1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3353,6 +3353,15 @@ static int add_subprog_and_kfunc(struct bpf_verifier_env *env)
return 0;
}
+static int jmp_offset(struct bpf_insn *insn)
+{
+ u8 code = insn->code;
+
+ if (code == (BPF_JMP32 | BPF_JA))
+ return insn->imm;
+ return insn->off;
+}
+
static int check_subprogs(struct bpf_verifier_env *env)
{
int i, subprog_start, subprog_end, off, cur_subprog = 0;
@@ -3379,10 +3388,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
goto next;
if (BPF_OP(code) == BPF_EXIT || BPF_OP(code) == BPF_CALL)
goto next;
- if (code == (BPF_JMP32 | BPF_JA))
- off = i + insn[i].imm + 1;
- else
- off = i + insn[i].off + 1;
+ off = i + jmp_offset(&insn[i]) + 1;
if (off < subprog_start || off >= subprog_end) {
verbose(env, "jump out of range from insn %d to %d\n", i, off);
return -EINVAL;
@@ -3912,6 +3918,17 @@ static const char *disasm_kfunc_name(void *data, const struct bpf_insn *insn)
return btf_name_by_offset(desc_btf, func->name_off);
}
+static void verbose_insn(struct bpf_verifier_env *env, struct bpf_insn *insn)
+{
+ const struct bpf_insn_cbs cbs = {
+ .cb_call = disasm_kfunc_name,
+ .cb_print = verbose,
+ .private_data = env,
+ };
+
+ print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
+}
+
static inline void bt_init(struct backtrack_state *bt, u32 frame)
{
bt->frame = frame;
@@ -4112,11 +4129,6 @@ static bool calls_callback(struct bpf_verifier_env *env, int insn_idx);
static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
struct bpf_insn_hist_entry *hist, struct backtrack_state *bt)
{
- const struct bpf_insn_cbs cbs = {
- .cb_call = disasm_kfunc_name,
- .cb_print = verbose,
- .private_data = env,
- };
struct bpf_insn *insn = env->prog->insnsi + idx;
u8 class = BPF_CLASS(insn->code);
u8 opcode = BPF_OP(insn->code);
@@ -4134,7 +4146,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
fmt_stack_mask(env->tmp_str_buf, TMP_STR_BUF_LEN, bt_stack_mask(bt));
verbose(env, "stack=%s before ", env->tmp_str_buf);
verbose(env, "%d: ", idx);
- print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
+ verbose_insn(env, insn);
}
/* If there is a history record that some registers gained range at this insn,
@@ -11011,6 +11023,9 @@ static int get_helper_proto(struct bpf_verifier_env *env, int func_id,
return *ptr ? 0 : -EINVAL;
}
+/* Bitmask with 1s for all caller saved registers */
+#define ALL_CALLER_SAVED_REGS ((1u << CALLER_SAVED_REGS) - 1)
+
static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
int *insn_idx_p)
{
@@ -17246,9 +17261,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
static int check_cfg(struct bpf_verifier_env *env)
{
int insn_cnt = env->prog->len;
- int *insn_stack, *insn_state;
+ int *insn_stack, *insn_state, *insn_postorder;
int ex_insn_beg, i, ret = 0;
- bool ex_done = false;
insn_state = env->cfg.insn_state = kvcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
if (!insn_state)
@@ -17260,6 +17274,17 @@ static int check_cfg(struct bpf_verifier_env *env)
return -ENOMEM;
}
+ insn_postorder = env->cfg.insn_postorder = kvcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
+ if (!insn_postorder) {
+ kvfree(insn_state);
+ kvfree(insn_stack);
+ return -ENOMEM;
+ }
+
+ ex_insn_beg = env->exception_callback_subprog
+ ? env->subprog_info[env->exception_callback_subprog].start
+ : 0;
+
insn_state[0] = DISCOVERED; /* mark 1st insn as discovered */
insn_stack[0] = 0; /* 0 is the first instruction */
env->cfg.cur_stack = 1;
@@ -17273,6 +17298,7 @@ static int check_cfg(struct bpf_verifier_env *env)
case DONE_EXPLORING:
insn_state[t] = EXPLORED;
env->cfg.cur_stack--;
+ insn_postorder[env->cfg.cur_postorder++] = t;
break;
case KEEP_EXPLORING:
break;
@@ -17291,13 +17317,10 @@ static int check_cfg(struct bpf_verifier_env *env)
goto err_free;
}
- if (env->exception_callback_subprog && !ex_done) {
- ex_insn_beg = env->subprog_info[env->exception_callback_subprog].start;
-
+ if (ex_insn_beg && insn_state[ex_insn_beg] != EXPLORED) {
insn_state[ex_insn_beg] = DISCOVERED;
insn_stack[0] = ex_insn_beg;
env->cfg.cur_stack = 1;
- ex_done = true;
goto walk_cfg;
}
@@ -19121,19 +19144,13 @@ static int do_check(struct bpf_verifier_env *env)
}
if (env->log.level & BPF_LOG_LEVEL) {
- const struct bpf_insn_cbs cbs = {
- .cb_call = disasm_kfunc_name,
- .cb_print = verbose,
- .private_data = env,
- };
-
if (verifier_state_scratched(env))
print_insn_state(env, state, state->curframe);
verbose_linfo(env, env->insn_idx, "; ");
env->prev_log_pos = env->log.end_pos;
verbose(env, "%d: ", env->insn_idx);
- print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
+ verbose_insn(env, insn);
env->prev_insn_print_pos = env->log.end_pos - env->prev_log_pos;
env->prev_log_pos = env->log.end_pos;
}
@@ -23199,6 +23216,312 @@ static int process_fd_array(struct bpf_verifier_env *env, union bpf_attr *attr,
return 0;
}
+static bool can_fallthrough(struct bpf_insn *insn)
+{
+ u8 class = BPF_CLASS(insn->code);
+ u8 opcode = BPF_OP(insn->code);
+
+ if (class != BPF_JMP && class != BPF_JMP32)
+ return true;
+
+ if (opcode == BPF_EXIT || opcode == BPF_JA)
+ return false;
+
+ return true;
+}
+
+static bool can_jump(struct bpf_insn *insn)
+{
+ u8 class = BPF_CLASS(insn->code);
+ u8 opcode = BPF_OP(insn->code);
+
+ if (class != BPF_JMP && class != BPF_JMP32)
+ return false;
+
+ switch (opcode) {
+ case BPF_JA:
+ case BPF_JEQ:
+ case BPF_JNE:
+ case BPF_JLT:
+ case BPF_JLE:
+ case BPF_JGT:
+ case BPF_JGE:
+ case BPF_JSGT:
+ case BPF_JSGE:
+ case BPF_JSLT:
+ case BPF_JSLE:
+ case BPF_JCOND:
+ return true;
+ }
+
+ return false;
+}
+
+static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
+{
+ struct bpf_insn *insn = &prog->insnsi[idx];
+ int i = 0, insn_sz;
+ u32 dst;
+
+ succ[0] = prog->len;
+ succ[1] = prog->len;
+
+ insn_sz = bpf_is_ldimm64(insn) ? 2 : 1;
+ if (can_fallthrough(insn) && idx + 1 < prog->len)
+ succ[i++] = idx + insn_sz;
+
+ if (can_jump(insn)) {
+ dst = idx + jmp_offset(insn) + 1;
+ if (i == 0 || succ[0] != dst)
+ succ[i++] = dst;
+ }
+
+ return i;
+}
+
+/* Each field is a register bitmask */
+struct insn_live_regs {
+ u16 use; /* registers read by instruction */
+ u16 def; /* registers written by instruction */
+ u16 in; /* registers that may be alive before instruction */
+ u16 out; /* registers that may be alive after instruction */
+};
+
+/* Compute *use and *def values for the call instruction */
+static void compute_call_live_regs(struct bpf_verifier_env *env,
+ struct bpf_insn *insn,
+ u16 *use, u16 *def)
+{
+ struct bpf_kfunc_call_arg_meta meta;
+ const struct bpf_func_proto *fn;
+ int err, i, nargs;
+
+ *def = ALL_CALLER_SAVED_REGS;
+ *use = *def & ~BIT(BPF_REG_0);
+ if (bpf_helper_call(insn)) {
+ err = get_helper_proto(env, insn->imm, &fn);
+ if (err)
+ return;
+ *use = 0;
+ for (i = 1; i < CALLER_SAVED_REGS; i++) {
+ if (fn->arg_type[i - 1] == ARG_DONTCARE)
+ break;
+ *use |= BIT(i);
+ }
+ } else if (bpf_pseudo_kfunc_call(insn)) {
+ err = fetch_kfunc_meta(env, insn, &meta, NULL);
+ if (err)
+ return;
+ nargs = btf_type_vlen(meta.func_proto);
+ *use = 0;
+ for (i = 1; i <= nargs; i++)
+ *use |= BIT(i);
+ }
+}
+
+/* Compute info->{use,def} fields for the instruction */
+static void compute_insn_live_regs(struct bpf_verifier_env *env,
+ struct bpf_insn *insn,
+ struct insn_live_regs *info)
+{
+ u8 class = BPF_CLASS(insn->code);
+ u8 code = BPF_OP(insn->code);
+ u8 mode = BPF_MODE(insn->code);
+ u16 src = BIT(insn->src_reg);
+ u16 dst = BIT(insn->dst_reg);
+ u16 r0 = BIT(0);
+ u16 def = 0;
+ u16 use = 0xffff;
+
+ switch (class) {
+ case BPF_LD:
+ switch (mode) {
+ case BPF_IMM:
+ if (BPF_SIZE(insn->code) == BPF_DW) {
+ def = dst;
+ use = 0;
+ }
+ break;
+ case BPF_LD | BPF_ABS:
+ case BPF_LD | BPF_IND:
+ /* stick with defaults */
+ break;
+ }
+ break;
+ case BPF_LDX:
+ switch (mode) {
+ case BPF_MEM:
+ case BPF_MEMSX:
+ def = dst;
+ use = src;
+ break;
+ }
+ break;
+ case BPF_ST:
+ switch (mode) {
+ case BPF_MEM:
+ def = 0;
+ use = dst;
+ break;
+ }
+ break;
+ case BPF_STX:
+ switch (mode) {
+ case BPF_MEM:
+ def = 0;
+ use = dst | src;
+ break;
+ case BPF_ATOMIC:
+ use = dst | src;
+ if (insn->imm & BPF_FETCH) {
+ if (insn->imm == BPF_CMPXCHG)
+ def = r0;
+ else
+ def = src;
+ } else {
+ def = 0;
+ }
+ break;
+ }
+ break;
+ case BPF_ALU:
+ case BPF_ALU64:
+ switch (code) {
+ case BPF_END:
+ use = dst;
+ def = dst;
+ break;
+ case BPF_MOV:
+ def = dst;
+ if (BPF_SRC(insn->code) == BPF_K)
+ use = 0;
+ else
+ use = src;
+ break;
+ default:
+ def = dst;
+ if (BPF_SRC(insn->code) == BPF_K)
+ use = dst;
+ else
+ use = dst | src;
+ }
+ break;
+ case BPF_JMP:
+ case BPF_JMP32:
+ switch (code) {
+ case BPF_JA:
+ def = 0;
+ use = 0;
+ break;
+ case BPF_EXIT:
+ def = 0;
+ use = r0;
+ break;
+ case BPF_CALL:
+ compute_call_live_regs(env, insn, &use, &def);
+ break;
+ default:
+ def = 0;
+ if (BPF_SRC(insn->code) == BPF_K)
+ use = dst;
+ else
+ use = dst | src;
+ }
+ break;
+ }
+
+ info->def = def;
+ info->use = use;
+}
+
+/* Compute may-live registers before each instruction in the program.
+ * A register is live before instruction I if it is read by I or by some
+ * instruction S that follows I, provided it is not overwritten between I
+ * and S.
+ *
+ * Store result in env->insn_aux_data[i].live_regs.
+ */
+static int compute_live_registers(struct bpf_verifier_env *env)
+{
+ struct bpf_insn_aux_data *insn_aux = env->insn_aux_data;
+ struct bpf_insn *insns = env->prog->insnsi;
+ struct insn_live_regs *state;
+ int insn_cnt = env->prog->len;
+ int err = 0, i, j;
+ bool changed;
+
+ /* Use simple algorithm desribed in:
+ * https://en.wikipedia.org/wiki/Live-variable_analysis
+ *
+ * - visit each instruction in a postorder and update
+ * state[i].in, state[i].out as follows:
+ *
+ * state[i].out = U [state[s].in for S in insn_successors(i)]
+ * state[i].in = (state[i].out / state[i].def) U state[i].use
+ *
+ * (where U stands for set union, / stands for set difference)
+ * - repeat the computation while {in,out} fields change for
+ * any instruction.
+ */
+ state = kvcalloc(insn_cnt, sizeof(*state), GFP_KERNEL);
+ if (!state) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ for (i = 0; i < insn_cnt; ++i)
+ compute_insn_live_regs(env, &insns[i], &state[i]);
+
+ changed = true;
+ while (changed) {
+ changed = false;
+ for (i = 0; i < env->cfg.cur_postorder; ++i) {
+ int insn_idx = env->cfg.insn_postorder[i];
+ struct insn_live_regs *live = &state[insn_idx];
+ int succ_num;
+ u32 succ[2];
+ u16 new_out = 0;
+ u16 new_in = 0;
+
+ succ_num = insn_successors(env->prog, insn_idx, succ);
+ for (int s = 0; s < succ_num; ++s)
+ new_out |= state[succ[s]].in;
+ new_in = (new_out & ~live->def) | live->use;
+ if (new_out != live->out || new_in != live->in) {
+ live->in = new_in;
+ live->out = new_out;
+ changed = true;
+ }
+ }
+ }
+
+ for (i = 0; i < insn_cnt; ++i)
+ insn_aux[i].live_regs_before = state[i].in;
+
+ if (env->log.level & BPF_LOG_LEVEL2) {
+ verbose(env, "Live regs before insn:\n");
+ for (i = 0; i < insn_cnt; ++i) {
+ verbose(env, "%3d: ", i);
+ for (j = BPF_REG_0; j < BPF_REG_10; ++j)
+ if (insn_aux[i].live_regs_before & BIT(j))
+ verbose(env, "%d", j);
+ else
+ verbose(env, ".");
+ verbose(env, " ");
+ verbose_insn(env, &insns[i]);
+ if (bpf_is_ldimm64(&insns[i]))
+ i++;
+ }
+ }
+
+out:
+ kvfree(state);
+ kvfree(env->cfg.insn_postorder);
+ env->cfg.insn_postorder = NULL;
+ env->cfg.cur_postorder = 0;
+ return err;
+}
+
int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
{
u64 start_time = ktime_get_ns();
@@ -23320,6 +23643,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
if (ret)
goto skip_full_check;
+ if (is_priv) {
+ ret = compute_live_registers(env);
+ if (ret < 0)
+ goto skip_full_check;
+ }
+
ret = mark_fastcall_patterns(env);
if (ret < 0)
goto skip_full_check;
@@ -23458,6 +23787,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
vfree(env->insn_aux_data);
kvfree(env->insn_hist);
err_free_env:
+ kvfree(env->cfg.insn_postorder);
kvfree(env);
return ret;
}
diff --git a/tools/testing/selftests/bpf/prog_tests/align.c b/tools/testing/selftests/bpf/prog_tests/align.c
index 4ebd0da898f5..1d53a8561ee2 100644
--- a/tools/testing/selftests/bpf/prog_tests/align.c
+++ b/tools/testing/selftests/bpf/prog_tests/align.c
@@ -610,9 +610,11 @@ static int do_test_single(struct bpf_align_test *test)
.log_size = sizeof(bpf_vlog),
.log_level = 2,
);
+ const char *main_pass_start = "0: R1=ctx() R10=fp0";
const char *line_ptr;
int cur_line = -1;
int prog_len, i;
+ char *start;
int fd_prog;
int ret;
@@ -632,7 +634,13 @@ static int do_test_single(struct bpf_align_test *test)
ret = 0;
/* We make a local copy so that we can strtok() it */
strncpy(bpf_vlog_copy, bpf_vlog, sizeof(bpf_vlog_copy));
- line_ptr = strtok(bpf_vlog_copy, "\n");
+ start = strstr(bpf_vlog_copy, main_pass_start);
+ if (!start) {
+ ret = 1;
+ printf("Can't find initial line '%s'\n", main_pass_start);
+ goto out;
+ }
+ line_ptr = strtok(start, "\n");
for (i = 0; i < MAX_MATCHES; i++) {
struct bpf_reg_match m = test->matches[i];
const char *p;
@@ -682,6 +690,7 @@ static int do_test_single(struct bpf_align_test *test)
break;
}
}
+out:
if (fd_prog >= 0)
close(fd_prog);
}
--
2.48.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH bpf-next v1 2/3] bpf: use register liveness information for func_states_equal
2025-02-28 6:00 [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Eduard Zingerman
2025-02-28 6:00 ` [PATCH bpf-next v1 1/3] " Eduard Zingerman
@ 2025-02-28 6:00 ` Eduard Zingerman
2025-02-28 6:00 ` [PATCH bpf-next v1 3/3] selftests/bpf: test cases for compute_live_registers() Eduard Zingerman
2025-03-01 2:10 ` [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Alexei Starovoitov
3 siblings, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2025-02-28 6:00 UTC (permalink / raw)
To: bpf, ast
Cc: andrii, daniel, martin.lau, kernel-team, yonghong.song, tj,
Eduard Zingerman
Liveness analysis DFA computes a set of registers that are live before
each instruction. Leverage this information to skip the comparison of
dead registers in `func_states_equal()`. This helps with the
convergence of iterator-based loops, as `bpf_reg_state->live` marks
can't be used when loops are processed. For now, enable this only in
privileged mode.
Verification performance impact on selftests and sched_ext is listed
below. (Using `veristat -C -f "insns_pct>5" -f "!insns<200"` filters).
selftests:
File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
-------------------- ----------------------------- --------- --------- ---------------- ---------- ---------- --------------
arena_htab_asm.bpf.o arena_htab_asm 445 413 -32 (-7.19%) 37 33 -4 (-10.81%)
arena_list.bpf.o arena_list_add 1822 833 -989 (-54.28%) 37 22 -15 (-40.54%)
dynptr_success.bpf.o test_dynptr_copy 267 203 -64 (-23.97%) 22 16 -6 (-27.27%)
dynptr_success.bpf.o test_dynptr_copy_xdp 719 615 -104 (-14.46%) 68 58 -10 (-14.71%)
iters.bpf.o checkpoint_states_deletion 22154 1211 -20943 (-94.53%) 918 40 -878 (-95.64%)
iters.bpf.o clean_live_states 1348 588 -760 (-56.38%) 136 66 -70 (-51.47%)
iters.bpf.o iter_nested_deeply_iters 367 300 -67 (-18.26%) 43 37 -6 (-13.95%)
iters.bpf.o iter_nested_iters 772 632 -140 (-18.13%) 72 62 -10 (-13.89%)
iters.bpf.o iter_pass_iter_ptr_to_subprog 285 243 -42 (-14.74%) 30 26 -4 (-13.33%)
iters.bpf.o iter_subprog_iters 808 664 -144 (-17.82%) 68 59 -9 (-13.24%)
iters.bpf.o loop_state_deps2 356 321 -35 (-9.83%) 35 32 -3 (-8.57%)
iters_css.bpf.o iter_css_for_each 296 267 -29 (-9.80%) 32 29 -3 (-9.38%)
pyperf600_iter.bpf.o on_event 6379 2591 -3788 (-59.38%) 286 192 -94 (-32.87%)
test_usdt.bpf.o usdt12 1983 1803 -180 (-9.08%) 143 136 -7 (-4.90%)
sched_ext:
File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
----------------- ---------------------- --------- --------- ---------------- ---------- ---------- ---------------
bpf.bpf.o lavd_dispatch 154608 120590 -34018 (-22.00%) 8950 7065 -1885 (-21.06%)
bpf.bpf.o lavd_init 7330 6935 -395 (-5.39%) 516 480 -36 (-6.98%)
bpf.bpf.o layered_dispatch 9039 5590 -3449 (-38.16%) 662 501 -161 (-24.32%)
bpf.bpf.o layered_dump 5022 3669 -1353 (-26.94%) 298 237 -61 (-20.47%)
bpf.bpf.o layered_init 5549 4298 -1251 (-22.54%) 523 423 -100 (-19.12%)
bpf.bpf.o layered_init_task 270 234 -36 (-13.33%) 24 22 -2 (-8.33%)
bpf.bpf.o layered_runnable 1899 1635 -264 (-13.90%) 151 125 -26 (-17.22%)
bpf.bpf.o p2dq_dispatch 659 533 -126 (-19.12%) 66 53 -13 (-19.70%)
bpf.bpf.o p2dq_init 1936 1560 -376 (-19.42%) 170 142 -28 (-16.47%)
bpf.bpf.o refresh_layer_cpumasks 1285 785 -500 (-38.91%) 120 78 -42 (-35.00%)
bpf.bpf.o rustland_init 476 413 -63 (-13.24%) 37 34 -3 (-8.11%)
bpf.bpf.o rustland_init 476 413 -63 (-13.24%) 37 34 -3 (-8.11%)
bpf.bpf.o rusty_select_cpu 1386 1110 -276 (-19.91%) 125 108 -17 (-13.60%)
bpf.bpf.o rusty_set_cpumask 4558 4276 -282 (-6.19%) 323 313 -10 (-3.10%)
scx_central.bpf.o central_dispatch 600 422 -178 (-29.67%) 59 43 -16 (-27.12%)
scx_central.bpf.o central_init 632 318 -314 (-49.68%) 39 28 -11 (-28.21%)
scx_nest.bpf.o nest_init 601 519 -82 (-13.64%) 58 51 -7 (-12.07%)
scx_pair.bpf.o pair_dispatch 1914 1376 -538 (-28.11%) 142 111 -31 (-21.83%)
scx_qmap.bpf.o qmap_dispatch 2187 1703 -484 (-22.13%) 174 141 -33 (-18.97%)
scx_qmap.bpf.o qmap_init 22777 18458 -4319 (-18.96%) 768 654 -114 (-14.84%)
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
include/linux/bpf_verifier.h | 1 +
kernel/bpf/verifier.c | 18 ++++++++++++++----
2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 8c23958bc471..39097835b326 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -733,6 +733,7 @@ struct bpf_verifier_env {
* to writes with variable offset and to indirect (helper) accesses.
*/
bool allow_uninit_stack;
+ bool allow_liveregs_dfa;
bool bpf_capable;
bool bypass_spec_v1;
bool bypass_spec_v4;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4ac7dc58d9b1..b6ab49ee31e1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18358,15 +18358,20 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
* the current state will reach 'bpf_exit' instruction safely
*/
static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_state *old,
- struct bpf_func_state *cur, enum exact_level exact)
+ struct bpf_func_state *cur, u32 insn_idx, enum exact_level exact)
{
- int i;
+ u16 live_regs;
+ u16 i;
if (old->callback_depth > cur->callback_depth)
return false;
+ live_regs = env->allow_liveregs_dfa
+ ? env->insn_aux_data[insn_idx].live_regs_before
+ : 0xffff;
for (i = 0; i < MAX_BPF_REG; i++)
- if (!regsafe(env, &old->regs[i], &cur->regs[i],
+ if ((BIT(i) & live_regs) &&
+ !regsafe(env, &old->regs[i], &cur->regs[i],
&env->idmap_scratch, exact))
return false;
@@ -18387,6 +18392,7 @@ static bool states_equal(struct bpf_verifier_env *env,
struct bpf_verifier_state *cur,
enum exact_level exact)
{
+ u32 insn_idx;
int i;
if (old->curframe != cur->curframe)
@@ -18410,9 +18416,12 @@ static bool states_equal(struct bpf_verifier_env *env,
* and all frame states need to be equivalent
*/
for (i = 0; i <= old->curframe; i++) {
+ insn_idx = i == old->curframe
+ ? env->insn_idx
+ : old->frame[i + 1]->callsite;
if (old->frame[i]->callsite != cur->frame[i]->callsite)
return false;
- if (!func_states_equal(env, old->frame[i], cur->frame[i], exact))
+ if (!func_states_equal(env, old->frame[i], cur->frame[i], insn_idx, exact))
return false;
}
return true;
@@ -23647,6 +23656,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
ret = compute_live_registers(env);
if (ret < 0)
goto skip_full_check;
+ env->allow_liveregs_dfa = true;
}
ret = mark_fastcall_patterns(env);
--
2.48.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH bpf-next v1 3/3] selftests/bpf: test cases for compute_live_registers()
2025-02-28 6:00 [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Eduard Zingerman
2025-02-28 6:00 ` [PATCH bpf-next v1 1/3] " Eduard Zingerman
2025-02-28 6:00 ` [PATCH bpf-next v1 2/3] bpf: use register liveness information for func_states_equal Eduard Zingerman
@ 2025-02-28 6:00 ` Eduard Zingerman
2025-03-01 2:10 ` [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Alexei Starovoitov
3 siblings, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2025-02-28 6:00 UTC (permalink / raw)
To: bpf, ast
Cc: andrii, daniel, martin.lau, kernel-team, yonghong.song, tj,
Eduard Zingerman
Cover instructions from each kind:
- assignment
- arithmetic
- store/load
- endian conversion
- atomics
- branches, conditional branches, may_goto, calls
- LD_ABS/LD_IND
- address_space_cast
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
.../bpf/prog_tests/compute_live_registers.c | 9 +
tools/testing/selftests/bpf/progs/bpf_misc.h | 12 +
.../bpf/progs/compute_live_registers.c | 397 ++++++++++++++++++
.../selftests/bpf/progs/verifier_gotol.c | 6 +-
.../bpf/progs/verifier_iterating_callbacks.c | 6 +-
5 files changed, 420 insertions(+), 10 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/compute_live_registers.c
create mode 100644 tools/testing/selftests/bpf/progs/compute_live_registers.c
diff --git a/tools/testing/selftests/bpf/prog_tests/compute_live_registers.c b/tools/testing/selftests/bpf/prog_tests/compute_live_registers.c
new file mode 100644
index 000000000000..285f20241fe1
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/compute_live_registers.c
@@ -0,0 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "compute_live_registers.skel.h"
+#include "test_progs.h"
+
+void test_compute_live_registers(void)
+{
+ RUN_TESTS(compute_live_registers);
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
index 34f555da546f..e12e74e7e66e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_misc.h
+++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
@@ -213,4 +213,16 @@
#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
#endif
+#if (defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86) || \
+ (defined(__TARGET_ARCH_riscv) && __riscv_xlen == 64) || \
+ defined(__TARGET_ARCH_arm) || defined(__TARGET_ARCH_s390) || \
+ defined(__TARGET_ARCH_loongarch)) && \
+ __clang_major__ >= 18
+#define CAN_USE_GOTOL
+#endif
+
+#if _clang_major__ >= 18
+#define CAN_USE_BPF_ST
+#endif
+
#endif
diff --git a/tools/testing/selftests/bpf/progs/compute_live_registers.c b/tools/testing/selftests/bpf/progs/compute_live_registers.c
new file mode 100644
index 000000000000..f976dec2bb88
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/compute_live_registers.c
@@ -0,0 +1,397 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "../../../include/linux/filter.h"
+#include "bpf_arena_common.h"
+#include "bpf_misc.h"
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __uint(max_entries, 1);
+ __type(key, __u32);
+ __type(value, __u64);
+} test_map SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARENA);
+ __uint(map_flags, BPF_F_MMAPABLE);
+ __uint(max_entries, 1);
+} arena SEC(".maps");
+
+SEC("socket")
+__log_level(2)
+__msg(" 0: .......... (b7) r0 = 42")
+__msg(" 1: 0......... (bf) r1 = r0")
+__msg(" 2: .1........ (bf) r2 = r1")
+__msg(" 3: ..2....... (bf) r3 = r2")
+__msg(" 4: ...3...... (bf) r4 = r3")
+__msg(" 5: ....4..... (bf) r5 = r4")
+__msg(" 6: .....5.... (bf) r6 = r5")
+__msg(" 7: ......6... (bf) r7 = r6")
+__msg(" 8: .......7.. (bf) r8 = r7")
+__msg(" 9: ........8. (bf) r9 = r8")
+__msg("10: .........9 (bf) r0 = r9")
+__msg("11: 0......... (95) exit")
+__naked void assign_chain(void)
+{
+ asm volatile (
+ "r0 = 42;"
+ "r1 = r0;"
+ "r2 = r1;"
+ "r3 = r2;"
+ "r4 = r3;"
+ "r5 = r4;"
+ "r6 = r5;"
+ "r7 = r6;"
+ "r8 = r7;"
+ "r9 = r8;"
+ "r0 = r9;"
+ "exit;"
+ ::: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("0: .......... (b7) r1 = 7")
+__msg("1: .1........ (07) r1 += 7")
+__msg("2: .......... (b7) r2 = 7")
+__msg("3: ..2....... (b7) r3 = 42")
+__msg("4: ..23...... (0f) r2 += r3")
+__msg("5: .......... (b7) r0 = 0")
+__msg("6: 0......... (95) exit")
+__naked void arithmetics(void)
+{
+ asm volatile (
+ "r1 = 7;"
+ "r1 += 7;"
+ "r2 = 7;"
+ "r3 = 42;"
+ "r2 += r3;"
+ "r0 = 0;"
+ "exit;"
+ ::: __clobber_all);
+}
+
+#ifdef CAN_USE_BPF_ST
+SEC("socket")
+__log_level(2)
+__msg(" 1: .1........ (07) r1 += -8")
+__msg(" 2: .1........ (7a) *(u64 *)(r1 +0) = 7")
+__msg(" 3: .1........ (b7) r2 = 42")
+__msg(" 4: .12....... (7b) *(u64 *)(r1 +0) = r2")
+__msg(" 5: .12....... (7b) *(u64 *)(r1 +0) = r2")
+__msg(" 6: .......... (b7) r0 = 0")
+__naked void store(void)
+{
+ asm volatile (
+ "r1 = r10;"
+ "r1 += -8;"
+ "*(u64 *)(r1 +0) = 7;"
+ "r2 = 42;"
+ "*(u64 *)(r1 +0) = r2;"
+ "*(u64 *)(r1 +0) = r2;"
+ "r0 = 0;"
+ "exit;"
+ ::: __clobber_all);
+}
+#endif
+
+SEC("socket")
+__log_level(2)
+__msg("1: ....4..... (07) r4 += -8")
+__msg("2: ....4..... (79) r5 = *(u64 *)(r4 +0)")
+__msg("3: ....45.... (07) r4 += -8")
+__naked void load(void)
+{
+ asm volatile (
+ "r4 = r10;"
+ "r4 += -8;"
+ "r5 = *(u64 *)(r4 +0);"
+ "r4 += -8;"
+ "r0 = r5;"
+ "exit;"
+ ::: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("0: .1........ (61) r2 = *(u32 *)(r1 +0)")
+__msg("1: ..2....... (d4) r2 = le64 r2")
+__msg("2: ..2....... (bf) r0 = r2")
+__naked void endian(void)
+{
+ asm volatile (
+ "r2 = *(u32 *)(r1 +0);"
+ "r2 = le64 r2;"
+ "r0 = r2;"
+ "exit;"
+ ::: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg(" 8: 0......... (b7) r1 = 1")
+__msg(" 9: 01........ (db) r1 = atomic64_fetch_add((u64 *)(r0 +0), r1)")
+__msg("10: 01........ (c3) lock *(u32 *)(r0 +0) += r1")
+__msg("11: 01........ (db) r1 = atomic64_xchg((u64 *)(r0 +0), r1)")
+__msg("12: 01........ (bf) r2 = r0")
+__msg("13: .12....... (bf) r0 = r1")
+__msg("14: .12....... (db) r0 = atomic64_cmpxchg((u64 *)(r2 +0), r0, r1)")
+__naked void atomic(void)
+{
+ asm volatile (
+ "r2 = r10;"
+ "r2 += -8;"
+ "r1 = 0;"
+ "*(u64 *)(r2 +0) = r1;"
+ "r1 = %[test_map] ll;"
+ "call %[bpf_map_lookup_elem];"
+ "if r0 == 0 goto 1f;"
+ "r1 = 1;"
+ "r1 = atomic_fetch_add((u64 *)(r0 +0), r1);"
+ ".8byte %[add_nofetch];" /* same as "lock *(u32 *)(r0 +0) += r1;" */
+ "r1 = xchg_64(r0 + 0, r1);"
+ "r2 = r0;"
+ "r0 = r1;"
+ "r0 = cmpxchg_64(r2 + 0, r0, r1);"
+ "1: exit;"
+ :
+ : __imm(bpf_map_lookup_elem),
+ __imm_addr(test_map),
+ __imm_insn(add_nofetch, BPF_ATOMIC_OP(BPF_W, BPF_ADD, BPF_REG_0, BPF_REG_1, 0))
+ : __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("4: .12....7.. (85) call bpf_trace_printk#6")
+__msg("5: 0......7.. (0f) r0 += r7")
+__naked void regular_call(void)
+{
+ asm volatile (
+ "r7 = 1;"
+ "r1 = r10;"
+ "r1 += -8;"
+ "r2 = 1;"
+ "call %[bpf_trace_printk];"
+ "r0 += r7;"
+ "exit;"
+ :
+ : __imm(bpf_trace_printk)
+ : __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("2: 012....... (25) if r1 > 0x7 goto pc+1")
+__msg("3: ..2....... (bf) r0 = r2")
+__naked void if1(void)
+{
+ asm volatile (
+ "r0 = 1;"
+ "r2 = 2;"
+ "if r1 > 0x7 goto +1;"
+ "r0 = r2;"
+ "exit;"
+ ::: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("3: 0123...... (2d) if r1 > r3 goto pc+1")
+__msg("4: ..2....... (bf) r0 = r2")
+__naked void if2(void)
+{
+ asm volatile (
+ "r0 = 1;"
+ "r2 = 2;"
+ "r3 = 7;"
+ "if r1 > r3 goto +1;"
+ "r0 = r2;"
+ "exit;"
+ ::: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("0: .......... (b7) r1 = 0")
+__msg("1: .1........ (b7) r2 = 7")
+__msg("2: .12....... (25) if r1 > 0x7 goto pc+4")
+__msg("3: .12....... (07) r1 += 1")
+__msg("4: .12....... (27) r2 *= 2")
+__msg("5: .12....... (05) goto pc+0")
+__msg("6: .12....... (05) goto pc-5")
+__msg("7: .......... (b7) r0 = 0")
+__msg("8: 0......... (95) exit")
+__naked void loop(void)
+{
+ asm volatile (
+ "r1 = 0;"
+ "r2 = 7;"
+ "if r1 > 0x7 goto +4;"
+ "r1 += 1;"
+ "r2 *= 2;"
+ "goto +0;"
+ "goto -5;"
+ "r0 = 0;"
+ "exit;"
+ :
+ : __imm(bpf_trace_printk)
+ : __clobber_all);
+}
+
+#ifdef CAN_USE_GOTOL
+SEC("socket")
+__log_level(2)
+__msg("2: .123...... (25) if r1 > 0x7 goto pc+2")
+__msg("3: ..2....... (bf) r0 = r2")
+__msg("4: 0......... (06) gotol pc+1")
+__msg("5: ...3...... (bf) r0 = r3")
+__msg("6: 0......... (95) exit")
+__naked void gotol(void)
+{
+ asm volatile (
+ "r2 = 42;"
+ "r3 = 24;"
+ "if r1 > 0x7 goto +2;"
+ "r0 = r2;"
+ "gotol +1;"
+ "r0 = r3;"
+ "exit;"
+ :
+ : __imm(bpf_trace_printk)
+ : __clobber_all);
+}
+#endif
+
+SEC("socket")
+__log_level(2)
+__msg("0: 0......... (b7) r1 = 1")
+__msg("1: 01........ (e5) may_goto pc+1")
+__msg("2: 0......... (05) goto pc-3")
+__msg("3: .1........ (bf) r0 = r1")
+__msg("4: 0......... (95) exit")
+__naked void may_goto(void)
+{
+ asm volatile (
+ "1: r1 = 1;"
+ ".8byte %[may_goto];"
+ "goto 1b;"
+ "r0 = r1;"
+ "exit;"
+ :
+ : __imm(bpf_get_smp_processor_id),
+ __imm_insn(may_goto, BPF_RAW_INSN(BPF_JMP | BPF_JCOND, 0, 0, +1 /* offset */, 0))
+ : __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("1: 0......... (18) r2 = 0x7")
+__msg("3: 0.2....... (0f) r0 += r2")
+__naked void ldimm64(void)
+{
+ asm volatile (
+ "r0 = 0;"
+ "r2 = 0x7 ll;"
+ "r0 += r2;"
+ "exit;"
+ :
+ :: __clobber_all);
+}
+
+/* No rules specific for LD_ABS/LD_IND, default behaviour kicks in */
+SEC("socket")
+__log_level(2)
+__msg("2: 0123456789 (30) r0 = *(u8 *)skb[42]")
+__msg("3: 012.456789 (0f) r7 += r0")
+__msg("4: 012.456789 (b7) r3 = 42")
+__msg("5: 0123456789 (50) r0 = *(u8 *)skb[r3 + 0]")
+__msg("6: 0......7.. (0f) r7 += r0")
+__naked void ldabs(void)
+{
+ asm volatile (
+ "r6 = r1;"
+ "r7 = 0;"
+ "r0 = *(u8 *)skb[42];"
+ "r7 += r0;"
+ "r3 = 42;"
+ ".8byte %[ld_ind];" /* same as "r0 = *(u8 *)skb[r3];" */
+ "r7 += r0;"
+ "r0 = r7;"
+ "exit;"
+ :
+ : __imm_insn(ld_ind, BPF_LD_IND(BPF_B, BPF_REG_3, 0))
+ : __clobber_all);
+}
+
+
+#ifdef __BPF_FEATURE_ADDR_SPACE_CAST
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__log_level(2)
+__msg(" 6: .12345.... (85) call bpf_arena_alloc_pages")
+__msg(" 7: 0......... (bf) r1 = addr_space_cast(r0, 0, 1)")
+__msg(" 8: .1........ (b7) r2 = 42")
+__naked void addr_space_cast(void)
+{
+ asm volatile (
+ "r1 = %[arena] ll;"
+ "r2 = 0;"
+ "r3 = 1;"
+ "r4 = 0;"
+ "r5 = 0;"
+ "call %[bpf_arena_alloc_pages];"
+ "r1 = addr_space_cast(r0, 0, 1);"
+ "r2 = 42;"
+ "*(u64 *)(r1 +0) = r2;"
+ "r0 = 0;"
+ "exit;"
+ :
+ : __imm(bpf_arena_alloc_pages),
+ __imm_addr(arena)
+ : __clobber_all);
+}
+#endif
+
+static __used __naked int aux1(void)
+{
+ asm volatile (
+ "r0 = r1;"
+ "r0 += r2;"
+ "exit;"
+ ::: __clobber_all);
+}
+
+SEC("socket")
+__log_level(2)
+__msg("0: ....45.... (b7) r1 = 1")
+__msg("1: .1..45.... (b7) r2 = 2")
+__msg("2: .12.45.... (b7) r3 = 3")
+/* Conservative liveness for subprog parameters. */
+__msg("3: .12345.... (85) call pc+2")
+__msg("4: .......... (b7) r0 = 0")
+__msg("5: 0......... (95) exit")
+__msg("6: .12....... (bf) r0 = r1")
+__msg("7: 0.2....... (0f) r0 += r2")
+/* Conservative liveness for subprog return value. */
+__msg("8: 0......... (95) exit")
+__naked void subprog1(void)
+{
+ asm volatile (
+ "r1 = 1;"
+ "r2 = 2;"
+ "r3 = 3;"
+ "call aux1;"
+ "r0 = 0;"
+ "exit;"
+ ::: __clobber_all);
+}
+
+/* to retain debug info for BTF generation */
+void kfunc_root(void)
+{
+ bpf_arena_alloc_pages(0, 0, 0, 0, 0);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/verifier_gotol.c b/tools/testing/selftests/bpf/progs/verifier_gotol.c
index 05a329ee45ee..d5d8f24df394 100644
--- a/tools/testing/selftests/bpf/progs/verifier_gotol.c
+++ b/tools/testing/selftests/bpf/progs/verifier_gotol.c
@@ -4,11 +4,7 @@
#include <bpf/bpf_helpers.h>
#include "bpf_misc.h"
-#if (defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86) || \
- (defined(__TARGET_ARCH_riscv) && __riscv_xlen == 64) || \
- defined(__TARGET_ARCH_arm) || defined(__TARGET_ARCH_s390) || \
- defined(__TARGET_ARCH_loongarch)) && \
- __clang_major__ >= 18
+#ifdef CAN_USE_GOTOL
SEC("socket")
__description("gotol, small_imm")
diff --git a/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c b/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c
index e54bb5385bc1..75dd922e4e9f 100644
--- a/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c
+++ b/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c
@@ -407,11 +407,7 @@ l0_%=: call %[bpf_jiffies64]; \
: __clobber_all);
}
-#if (defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86) || \
- (defined(__TARGET_ARCH_riscv) && __riscv_xlen == 64) || \
- defined(__TARGET_ARCH_arm) || defined(__TARGET_ARCH_s390) || \
- defined(__TARGET_ARCH_loongarch)) && \
- __clang_major__ >= 18
+#ifdef CAN_USE_GOTOL
SEC("socket")
__success __retval(0)
__naked void gotol_and_may_goto(void)
--
2.48.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v1 1/3] bpf: simple DFA-based live registers analysis
2025-02-28 6:00 ` [PATCH bpf-next v1 1/3] " Eduard Zingerman
@ 2025-03-01 2:01 ` Alexei Starovoitov
2025-03-01 2:09 ` Eduard Zingerman
0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2025-03-01 2:01 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kernel Team, Yonghong Song, Tejun Heo
On Thu, Feb 27, 2025 at 10:01 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Compute may-live registers before each instruction in the program.
> A register is live before instruction I if it is read by I or by some
> instruction S that follows I, provided it is not overwritten between I
> and S.
>
> This information will be used in the next patch as a hint in
> func_states_equal().
>
> Use a simple algorithm described in [1] to compute this information:
> - define the following:
> - I.use : the set of all registers read by instruction I;
> - I.def : the set of all registers written by instruction I;
> - I.in : the set of all registers that may be alive before I
> execution;
> - I.out : the set of all registers that may be alive after I
> execution;
> - I.successors : the set of instructions S that might immediately
> follow I for some program execution;
> - associate separate empty sets 'I.in' and 'I.out' with each instruction;
> - visit each instruction in a postorder and update the corresponding
> 'I.in' and 'I.out' sets as follows:
>
> I.out = U [S.in for S in I.successors]
> I.in = (I.out / I.def) U I.use
>
> (where U stands for set union, / stands for set difference)
> - repeat the computation while I.{in,out} changes for any instruction.
>
> On implementation side keep things as simple, as possible:
> - check_cfg() already marks instructions EXPLORED in post-order,
> modify it to save the index of each EXPLORED instruction in a vector;
> - represent I.{in,out,use,def} as bitmasks;
> - don't split the program into basic blocks and don't maintain the
> work queue, instead:
> - perform fixed-point computation by visiting each instruction;
> - maintain a simple 'changed' flag to track if I.{in,out} changes
> for any instruction;
>
> Measurements show that even this simplistic implementation does not
> add measurable verification time overhead (at least for selftests).
>
> Note on check_cfg() ex_insn_beg/ex_done change:
> To avoid out of bounds access to env->cfg.insn_postorder array,
> it must be guaranteed that an instruction transitions to the EXPLORED
> state only once. Previously, this was not the case for incorrect
> programs with direct calls to exception callbacks.
>
> The 'align' selftest needs adjustment to skip the computed
> instruction/live registers printout. Otherwise, it matches lines from
> this printout instead of verification log.
>
> [1] https://en.wikipedia.org/wiki/Live-variable_analysis
>
> Suggested-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> ---
> include/linux/bpf_verifier.h | 6 +
> kernel/bpf/verifier.c | 376 ++++++++++++++++--
> .../testing/selftests/bpf/prog_tests/align.c | 11 +-
> 3 files changed, 369 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index bbd013c38ff9..8c23958bc471 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -591,6 +591,8 @@ struct bpf_insn_aux_data {
> * accepts callback function as a parameter.
> */
> bool calls_callback;
> + /* registers alive before this instruction. */
> + u16 live_regs_before;
> };
>
> #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
> @@ -747,7 +749,11 @@ struct bpf_verifier_env {
> struct {
> int *insn_state;
> int *insn_stack;
> + /* vector of instruction indexes sorted in post-order */
> + int *insn_postorder;
> int cur_stack;
> + /* current position in the insn_postorder vector */
> + int cur_postorder;
> } cfg;
> struct backtrack_state bt;
> struct bpf_insn_hist_entry *insn_hist;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index dcd0da4e62fc..4ac7dc58d9b1 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3353,6 +3353,15 @@ static int add_subprog_and_kfunc(struct bpf_verifier_env *env)
> return 0;
> }
>
> +static int jmp_offset(struct bpf_insn *insn)
> +{
> + u8 code = insn->code;
> +
> + if (code == (BPF_JMP32 | BPF_JA))
> + return insn->imm;
> + return insn->off;
> +}
> +
> static int check_subprogs(struct bpf_verifier_env *env)
> {
> int i, subprog_start, subprog_end, off, cur_subprog = 0;
> @@ -3379,10 +3388,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
> goto next;
> if (BPF_OP(code) == BPF_EXIT || BPF_OP(code) == BPF_CALL)
> goto next;
> - if (code == (BPF_JMP32 | BPF_JA))
> - off = i + insn[i].imm + 1;
> - else
> - off = i + insn[i].off + 1;
> + off = i + jmp_offset(&insn[i]) + 1;
Nice cleanup, but pls split it into pre-patch,
so that main liveness logic is in the main patch.
> if (off < subprog_start || off >= subprog_end) {
> verbose(env, "jump out of range from insn %d to %d\n", i, off);
> return -EINVAL;
> @@ -3912,6 +3918,17 @@ static const char *disasm_kfunc_name(void *data, const struct bpf_insn *insn)
> return btf_name_by_offset(desc_btf, func->name_off);
> }
>
> +static void verbose_insn(struct bpf_verifier_env *env, struct bpf_insn *insn)
> +{
> + const struct bpf_insn_cbs cbs = {
> + .cb_call = disasm_kfunc_name,
> + .cb_print = verbose,
> + .private_data = env,
> + };
> +
> + print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
> +}
> +
> static inline void bt_init(struct backtrack_state *bt, u32 frame)
> {
> bt->frame = frame;
> @@ -4112,11 +4129,6 @@ static bool calls_callback(struct bpf_verifier_env *env, int insn_idx);
> static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
> struct bpf_insn_hist_entry *hist, struct backtrack_state *bt)
> {
> - const struct bpf_insn_cbs cbs = {
> - .cb_call = disasm_kfunc_name,
> - .cb_print = verbose,
> - .private_data = env,
> - };
> struct bpf_insn *insn = env->prog->insnsi + idx;
> u8 class = BPF_CLASS(insn->code);
> u8 opcode = BPF_OP(insn->code);
> @@ -4134,7 +4146,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
> fmt_stack_mask(env->tmp_str_buf, TMP_STR_BUF_LEN, bt_stack_mask(bt));
> verbose(env, "stack=%s before ", env->tmp_str_buf);
> verbose(env, "%d: ", idx);
> - print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
> + verbose_insn(env, insn);
Same. Nice cleanup, but move it into pre-patch with all cleanups.
No need for separate patches per cleanup.
Just all cleanups into one will do.
> }
>
> /* If there is a history record that some registers gained range at this insn,
> @@ -11011,6 +11023,9 @@ static int get_helper_proto(struct bpf_verifier_env *env, int func_id,
> return *ptr ? 0 : -EINVAL;
> }
>
> +/* Bitmask with 1s for all caller saved registers */
> +#define ALL_CALLER_SAVED_REGS ((1u << CALLER_SAVED_REGS) - 1)
> +
> static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> int *insn_idx_p)
> {
> @@ -17246,9 +17261,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
> static int check_cfg(struct bpf_verifier_env *env)
> {
> int insn_cnt = env->prog->len;
> - int *insn_stack, *insn_state;
> + int *insn_stack, *insn_state, *insn_postorder;
> int ex_insn_beg, i, ret = 0;
> - bool ex_done = false;
>
> insn_state = env->cfg.insn_state = kvcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
> if (!insn_state)
> @@ -17260,6 +17274,17 @@ static int check_cfg(struct bpf_verifier_env *env)
> return -ENOMEM;
> }
>
> + insn_postorder = env->cfg.insn_postorder = kvcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
> + if (!insn_postorder) {
> + kvfree(insn_state);
> + kvfree(insn_stack);
> + return -ENOMEM;
> + }
> +
> + ex_insn_beg = env->exception_callback_subprog
> + ? env->subprog_info[env->exception_callback_subprog].start
> + : 0;
> +
> insn_state[0] = DISCOVERED; /* mark 1st insn as discovered */
> insn_stack[0] = 0; /* 0 is the first instruction */
> env->cfg.cur_stack = 1;
> @@ -17273,6 +17298,7 @@ static int check_cfg(struct bpf_verifier_env *env)
> case DONE_EXPLORING:
> insn_state[t] = EXPLORED;
> env->cfg.cur_stack--;
> + insn_postorder[env->cfg.cur_postorder++] = t;
> break;
> case KEEP_EXPLORING:
> break;
> @@ -17291,13 +17317,10 @@ static int check_cfg(struct bpf_verifier_env *env)
> goto err_free;
> }
>
> - if (env->exception_callback_subprog && !ex_done) {
> - ex_insn_beg = env->subprog_info[env->exception_callback_subprog].start;
> -
> + if (ex_insn_beg && insn_state[ex_insn_beg] != EXPLORED) {
> insn_state[ex_insn_beg] = DISCOVERED;
> insn_stack[0] = ex_insn_beg;
> env->cfg.cur_stack = 1;
> - ex_done = true;
> goto walk_cfg;
> }
>
> @@ -19121,19 +19144,13 @@ static int do_check(struct bpf_verifier_env *env)
> }
>
> if (env->log.level & BPF_LOG_LEVEL) {
> - const struct bpf_insn_cbs cbs = {
> - .cb_call = disasm_kfunc_name,
> - .cb_print = verbose,
> - .private_data = env,
> - };
> -
> if (verifier_state_scratched(env))
> print_insn_state(env, state, state->curframe);
>
> verbose_linfo(env, env->insn_idx, "; ");
> env->prev_log_pos = env->log.end_pos;
> verbose(env, "%d: ", env->insn_idx);
> - print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
> + verbose_insn(env, insn);
> env->prev_insn_print_pos = env->log.end_pos - env->prev_log_pos;
> env->prev_log_pos = env->log.end_pos;
> }
> @@ -23199,6 +23216,312 @@ static int process_fd_array(struct bpf_verifier_env *env, union bpf_attr *attr,
> return 0;
> }
>
> +static bool can_fallthrough(struct bpf_insn *insn)
> +{
> + u8 class = BPF_CLASS(insn->code);
> + u8 opcode = BPF_OP(insn->code);
> +
> + if (class != BPF_JMP && class != BPF_JMP32)
> + return true;
> +
> + if (opcode == BPF_EXIT || opcode == BPF_JA)
> + return false;
> +
> + return true;
> +}
> +
> +static bool can_jump(struct bpf_insn *insn)
> +{
> + u8 class = BPF_CLASS(insn->code);
> + u8 opcode = BPF_OP(insn->code);
> +
> + if (class != BPF_JMP && class != BPF_JMP32)
> + return false;
> +
> + switch (opcode) {
> + case BPF_JA:
> + case BPF_JEQ:
> + case BPF_JNE:
> + case BPF_JLT:
> + case BPF_JLE:
> + case BPF_JGT:
> + case BPF_JGE:
> + case BPF_JSGT:
> + case BPF_JSGE:
> + case BPF_JSLT:
> + case BPF_JSLE:
> + case BPF_JCOND:
> + return true;
> + }
> +
> + return false;
> +}
> +
> +static int insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2])
> +{
> + struct bpf_insn *insn = &prog->insnsi[idx];
> + int i = 0, insn_sz;
> + u32 dst;
> +
> + succ[0] = prog->len;
> + succ[1] = prog->len;
Why initialize them? They won't be used anyway.
> +
> + insn_sz = bpf_is_ldimm64(insn) ? 2 : 1;
> + if (can_fallthrough(insn) && idx + 1 < prog->len)
> + succ[i++] = idx + insn_sz;
> +
> + if (can_jump(insn)) {
> + dst = idx + jmp_offset(insn) + 1;
> + if (i == 0 || succ[0] != dst)
> + succ[i++] = dst;
> + }
> +
> + return i;
> +}
> +
> +/* Each field is a register bitmask */
> +struct insn_live_regs {
> + u16 use; /* registers read by instruction */
> + u16 def; /* registers written by instruction */
> + u16 in; /* registers that may be alive before instruction */
> + u16 out; /* registers that may be alive after instruction */
> +};
> +
> +/* Compute *use and *def values for the call instruction */
> +static void compute_call_live_regs(struct bpf_verifier_env *env,
> + struct bpf_insn *insn,
> + u16 *use, u16 *def)
> +{
> + struct bpf_kfunc_call_arg_meta meta;
> + const struct bpf_func_proto *fn;
> + int err, i, nargs;
> +
> + *def = ALL_CALLER_SAVED_REGS;
> + *use = *def & ~BIT(BPF_REG_0);
> + if (bpf_helper_call(insn)) {
> + err = get_helper_proto(env, insn->imm, &fn);
> + if (err)
> + return;
> + *use = 0;
> + for (i = 1; i < CALLER_SAVED_REGS; i++) {
> + if (fn->arg_type[i - 1] == ARG_DONTCARE)
> + break;
> + *use |= BIT(i);
> + }
> + } else if (bpf_pseudo_kfunc_call(insn)) {
> + err = fetch_kfunc_meta(env, insn, &meta, NULL);
> + if (err)
> + return;
> + nargs = btf_type_vlen(meta.func_proto);
> + *use = 0;
> + for (i = 1; i <= nargs; i++)
> + *use |= BIT(i);
> + }
> +}
The helper is very close to helper_fastcall_clobber_mask()
and kfunc_fastcall_clobber_mask(),
and the beginning of mark_fastcall_pattern_for_call().
Let's generalize the code and make it a pre-patch 2.
(separate from trivial cleanups).
In generic helper I'd standardize on 'use' word instead
of 'clobber_mask'.
The move of ALL_CALLER_SAVED_REGS can be there as well.
> +
> +/* Compute info->{use,def} fields for the instruction */
> +static void compute_insn_live_regs(struct bpf_verifier_env *env,
> + struct bpf_insn *insn,
> + struct insn_live_regs *info)
> +{
> + u8 class = BPF_CLASS(insn->code);
> + u8 code = BPF_OP(insn->code);
> + u8 mode = BPF_MODE(insn->code);
> + u16 src = BIT(insn->src_reg);
> + u16 dst = BIT(insn->dst_reg);
> + u16 r0 = BIT(0);
> + u16 def = 0;
> + u16 use = 0xffff;
> +
> + switch (class) {
> + case BPF_LD:
> + switch (mode) {
> + case BPF_IMM:
> + if (BPF_SIZE(insn->code) == BPF_DW) {
> + def = dst;
> + use = 0;
> + }
> + break;
> + case BPF_LD | BPF_ABS:
> + case BPF_LD | BPF_IND:
> + /* stick with defaults */
> + break;
> + }
> + break;
> + case BPF_LDX:
> + switch (mode) {
> + case BPF_MEM:
> + case BPF_MEMSX:
> + def = dst;
> + use = src;
> + break;
> + }
> + break;
> + case BPF_ST:
> + switch (mode) {
> + case BPF_MEM:
> + def = 0;
> + use = dst;
> + break;
> + }
> + break;
> + case BPF_STX:
> + switch (mode) {
> + case BPF_MEM:
> + def = 0;
> + use = dst | src;
> + break;
> + case BPF_ATOMIC:
> + use = dst | src;
> + if (insn->imm & BPF_FETCH) {
> + if (insn->imm == BPF_CMPXCHG)
> + def = r0;
> + else
> + def = src;
> + } else {
> + def = 0;
> + }
> + break;
> + }
> + break;
> + case BPF_ALU:
> + case BPF_ALU64:
> + switch (code) {
> + case BPF_END:
> + use = dst;
> + def = dst;
> + break;
> + case BPF_MOV:
> + def = dst;
> + if (BPF_SRC(insn->code) == BPF_K)
> + use = 0;
> + else
> + use = src;
> + break;
> + default:
> + def = dst;
> + if (BPF_SRC(insn->code) == BPF_K)
> + use = dst;
> + else
> + use = dst | src;
> + }
> + break;
> + case BPF_JMP:
> + case BPF_JMP32:
> + switch (code) {
> + case BPF_JA:
> + def = 0;
> + use = 0;
> + break;
> + case BPF_EXIT:
> + def = 0;
> + use = r0;
> + break;
> + case BPF_CALL:
> + compute_call_live_regs(env, insn, &use, &def);
> + break;
> + default:
> + def = 0;
> + if (BPF_SRC(insn->code) == BPF_K)
> + use = dst;
> + else
> + use = dst | src;
> + }
> + break;
> + }
> +
> + info->def = def;
> + info->use = use;
> +}
> +
> +/* Compute may-live registers before each instruction in the program.
> + * A register is live before instruction I if it is read by I or by some
> + * instruction S that follows I, provided it is not overwritten between I
> + * and S.
> + *
> + * Store result in env->insn_aux_data[i].live_regs.
> + */
> +static int compute_live_registers(struct bpf_verifier_env *env)
> +{
> + struct bpf_insn_aux_data *insn_aux = env->insn_aux_data;
> + struct bpf_insn *insns = env->prog->insnsi;
> + struct insn_live_regs *state;
> + int insn_cnt = env->prog->len;
> + int err = 0, i, j;
> + bool changed;
> +
> + /* Use simple algorithm desribed in:
described
> + * https://en.wikipedia.org/wiki/Live-variable_analysis
instead of the link let's copy paste definition of
I.use|def from commit log into this comment.
> + *
> + * - visit each instruction in a postorder and update
> + * state[i].in, state[i].out as follows:
> + *
> + * state[i].out = U [state[s].in for S in insn_successors(i)]
> + * state[i].in = (state[i].out / state[i].def) U state[i].use
> + *
> + * (where U stands for set union, / stands for set difference)
> + * - repeat the computation while {in,out} fields change for
> + * any instruction.
> + */
> + state = kvcalloc(insn_cnt, sizeof(*state), GFP_KERNEL);
> + if (!state) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + for (i = 0; i < insn_cnt; ++i)
> + compute_insn_live_regs(env, &insns[i], &state[i]);
> +
> + changed = true;
> + while (changed) {
> + changed = false;
> + for (i = 0; i < env->cfg.cur_postorder; ++i) {
> + int insn_idx = env->cfg.insn_postorder[i];
> + struct insn_live_regs *live = &state[insn_idx];
> + int succ_num;
> + u32 succ[2];
> + u16 new_out = 0;
> + u16 new_in = 0;
> +
> + succ_num = insn_successors(env->prog, insn_idx, succ);
> + for (int s = 0; s < succ_num; ++s)
> + new_out |= state[succ[s]].in;
> + new_in = (new_out & ~live->def) | live->use;
> + if (new_out != live->out || new_in != live->in) {
> + live->in = new_in;
> + live->out = new_out;
> + changed = true;
> + }
> + }
> + }
> +
> + for (i = 0; i < insn_cnt; ++i)
> + insn_aux[i].live_regs_before = state[i].in;
> +
> + if (env->log.level & BPF_LOG_LEVEL2) {
> + verbose(env, "Live regs before insn:\n");
> + for (i = 0; i < insn_cnt; ++i) {
> + verbose(env, "%3d: ", i);
> + for (j = BPF_REG_0; j < BPF_REG_10; ++j)
> + if (insn_aux[i].live_regs_before & BIT(j))
> + verbose(env, "%d", j);
> + else
> + verbose(env, ".");
I wonder whether we would need to see liveness at log_level=2
during the normal verifier output instead of once in a beginning.
I guess it's fine doing it once like you did.
> + verbose(env, " ");
> + verbose_insn(env, &insns[i]);
> + if (bpf_is_ldimm64(&insns[i]))
> + i++;
> + }
> + }
> +
> +out:
> + kvfree(state);
> + kvfree(env->cfg.insn_postorder);
> + env->cfg.insn_postorder = NULL;
> + env->cfg.cur_postorder = 0;
> + return err;
> +}
> +
> int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
> {
> u64 start_time = ktime_get_ns();
> @@ -23320,6 +23643,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
> if (ret)
> goto skip_full_check;
>
> + if (is_priv) {
> + ret = compute_live_registers(env);
> + if (ret < 0)
> + goto skip_full_check;
> + }
> +
> ret = mark_fastcall_patterns(env);
> if (ret < 0)
> goto skip_full_check;
> @@ -23458,6 +23787,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
> vfree(env->insn_aux_data);
> kvfree(env->insn_hist);
> err_free_env:
> + kvfree(env->cfg.insn_postorder);
> kvfree(env);
> return ret;
> }
> diff --git a/tools/testing/selftests/bpf/prog_tests/align.c b/tools/testing/selftests/bpf/prog_tests/align.c
> index 4ebd0da898f5..1d53a8561ee2 100644
> --- a/tools/testing/selftests/bpf/prog_tests/align.c
> +++ b/tools/testing/selftests/bpf/prog_tests/align.c
> @@ -610,9 +610,11 @@ static int do_test_single(struct bpf_align_test *test)
> .log_size = sizeof(bpf_vlog),
> .log_level = 2,
> );
> + const char *main_pass_start = "0: R1=ctx() R10=fp0";
> const char *line_ptr;
> int cur_line = -1;
> int prog_len, i;
> + char *start;
> int fd_prog;
> int ret;
>
> @@ -632,7 +634,13 @@ static int do_test_single(struct bpf_align_test *test)
> ret = 0;
> /* We make a local copy so that we can strtok() it */
> strncpy(bpf_vlog_copy, bpf_vlog, sizeof(bpf_vlog_copy));
> - line_ptr = strtok(bpf_vlog_copy, "\n");
> + start = strstr(bpf_vlog_copy, main_pass_start);
> + if (!start) {
> + ret = 1;
> + printf("Can't find initial line '%s'\n", main_pass_start);
> + goto out;
> + }
> + line_ptr = strtok(start, "\n");
> for (i = 0; i < MAX_MATCHES; i++) {
> struct bpf_reg_match m = test->matches[i];
> const char *p;
> @@ -682,6 +690,7 @@ static int do_test_single(struct bpf_align_test *test)
> break;
> }
> }
> +out:
> if (fd_prog >= 0)
> close(fd_prog);
> }
> --
> 2.48.1
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v1 1/3] bpf: simple DFA-based live registers analysis
2025-03-01 2:01 ` Alexei Starovoitov
@ 2025-03-01 2:09 ` Eduard Zingerman
0 siblings, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2025-03-01 2:09 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kernel Team, Yonghong Song, Tejun Heo
On Fri, 2025-02-28 at 18:01 -0800, Alexei Starovoitov wrote:
> On Thu, Feb 27, 2025 at 10:01 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
[...]
> > @@ -3379,10 +3388,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
> > goto next;
> > if (BPF_OP(code) == BPF_EXIT || BPF_OP(code) == BPF_CALL)
> > goto next;
> > - if (code == (BPF_JMP32 | BPF_JA))
> > - off = i + insn[i].imm + 1;
> > - else
> > - off = i + insn[i].off + 1;
> > + off = i + jmp_offset(&insn[i]) + 1;
>
> Nice cleanup, but pls split it into pre-patch,
> so that main liveness logic is in the main patch.
Thanks for the feedback.
I'll include all requested changes in v2.
[...]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
2025-02-28 6:00 [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Eduard Zingerman
` (2 preceding siblings ...)
2025-02-28 6:00 ` [PATCH bpf-next v1 3/3] selftests/bpf: test cases for compute_live_registers() Eduard Zingerman
@ 2025-03-01 2:10 ` Alexei Starovoitov
2025-03-01 4:40 ` Eduard Zingerman
3 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2025-03-01 2:10 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kernel Team, Yonghong Song, Tejun Heo
On Thu, Feb 27, 2025 at 10:00 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> This patch set introduces a simple live register DFA analysis.
> The analysis is performed as a separate step before the main
> verification pass, and results are stored in `env->insn_aux_data` for
> each instruction.
>
> This change improves handling of iterator/callback-based loops, as
> regular register liveness marks are not finalized while loops are
> being processed. See veristat results for selftests and sched_ext in
> patch #2.
>
> The patch set was tested in branch [1] by disabling the current
> register parentage chain liveness computation, using DFA-based
> liveness for registers while assuming all stack slots as live.
> No notable regressions were found in test_progs-based tests.
I think the end goal is to get rid of mark_reg_read() and
switch to proper live reg analysis.
So please include the numbers to see how much work left.
Also note that mark_reg_read() tracks 32 vs 64 reads separately.
iirc we did it to support fine grain mark_insn_zext
to help architectures where zext has to be inserted by JIT.
I'm not sure whether new liveness has to do it as well.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
2025-03-01 2:10 ` [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Alexei Starovoitov
@ 2025-03-01 4:40 ` Eduard Zingerman
2025-03-02 0:09 ` Alexei Starovoitov
0 siblings, 1 reply; 11+ messages in thread
From: Eduard Zingerman @ 2025-03-01 4:40 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kernel Team, Yonghong Song, Tejun Heo
On Fri, 2025-02-28 at 18:10 -0800, Alexei Starovoitov wrote:
[...]
> I think the end goal is to get rid of mark_reg_read() and
> switch to proper live reg analysis.
> So please include the numbers to see how much work left.
Complete removal of mark_reg_read() means that analysis needs to be
done for stack slots as well. The algorithm to handle stack slots is
much more complicated:
- it needs to track register / stack slot type to handle cases like
"r1 = r10" and spills of the stack pointer to stack;
- it needs to track register values, at-least crudely, to handle cases
like "r1 = r10; r1 += r2;" (array access).
The worst case scenario, as you suggested, is just to assume stack
slots live, but it is a big verification performance hit.
Exact numbers are at the end of the email.
> Also note that mark_reg_read() tracks 32 vs 64 reads separately.
> iirc we did it to support fine grain mark_insn_zext
> to help architectures where zext has to be inserted by JIT.
> I'm not sure whether new liveness has to do it as well.
As far as I understand, this is important for one check in
propagate_liveness(). And that check means something like:
"if this register was read as 64-bit value, remember that
it needs zero extension on 32-bit load".
Meaning that either DFA would need to track this bit of information
(should be simple), or more zero extensions would be added.
---
Repository [1] shared in cover letter was used for benchmarks below.
Abbreviations are as follows:
- Name: dfa-opts
Commit: b73005452a4a
Meaning: DFA as shared in this patch-set + a set of small
improvements which I decided to exclude from the
patch-set as described in the cover letter.
- Name: dfa-opts-no-rm
Commit: e486757fdada
Meaning: dfa-opts + read marks are disabled for registers.
- Name: dfa-opts-no-rm-sl
Commit: a9930e8127a9
Meaning: dfa-opts + read marks are disabled for registers
and stack.
[1] https://github.com/eddyz87/bpf/tree/liveregs-dfa-std-liveregs-off
Veristat output is filtered using -f "states_pct>5" -f "!insns<200".
Veristat results are followed by a histogram that accounts for all
tests.
Two comparisons are made:
- dfa-opts vs dfa-opts-no-rm (small negative impact, except two
sched_ext programs that hit 1M instructions limit; positive impact
would have indicated a bug);
- dfa-opts vs dfa-opts-no-rm-sl (big negative impact).
========= selftests: dfa-opts vs dfa-opts-no-rm =========
File Program States (A) States (B) States (DIFF)
------------------------ ---------------- ---------- ---------- -------------
test_l4lb_noinline.bpf.o balancer_ingress 219 231 +12 (+5.48%)
Total progs: 3565
Old success: 2054
New success: 2054
States diff min: 0.00%
States diff max: 5.48%
0% .. 5%: 3564
5% .. 10%: 1
========= scx: dfa-opts vs dfa-opts-no-rm =========
File Program States (A) States (B) States (DIFF)
--------- --------------- ---------- ---------- ------------------
bpf.bpf.o rusty_init 1944 55004 +53060 (+2729.42%)
bpf.bpf.o rusty_init_task 1732 55049 +53317 (+3078.35%)
Total progs: 216
Old success: 186
New success: 184
States diff min: 0.00%
States diff max: 3078.35%
0% .. 5%: 214
2725% .. 3080%: 2
========= selftests: dfa-opts vs dfa-opts-no-rm-sl =========
File Program States (A) States (B) States (DIFF)
-------------------------------- ------------------------------------ ---------- ---------- -----------------
arena_htab_asm.bpf.o arena_htab_asm 33 40 +7 (+21.21%)
bpf_cubic.bpf.o bpf_cubic_cong_avoid 92 98 +6 (+6.52%)
bpf_flow.bpf.o flow_dissector_0 66 125 +59 (+89.39%)
bpf_iter_ksym.bpf.o dump_ksym 16 21 +5 (+31.25%)
profiler1.bpf.o kprobe__proc_sys_write 84 140 +56 (+66.67%)
profiler1.bpf.o kprobe__vfs_link 504 543 +39 (+7.74%)
profiler1.bpf.o kprobe__vfs_symlink 238 466 +228 (+95.80%)
profiler1.bpf.o kprobe_ret__do_filp_open 247 274 +27 (+10.93%)
profiler1.bpf.o raw_tracepoint__sched_process_exec 139 350 +211 (+151.80%)
profiler1.bpf.o raw_tracepoint__sched_process_exit 67 86 +19 (+28.36%)
profiler1.bpf.o tracepoint__syscalls__sys_enter_kill 649 758 +109 (+16.80%)
profiler2.bpf.o kprobe__vfs_link 149 257 +108 (+72.48%)
profiler2.bpf.o kprobe_ret__do_filp_open 106 120 +14 (+13.21%)
profiler2.bpf.o raw_tracepoint__sched_process_exec 126 140 +14 (+11.11%)
profiler3.bpf.o kprobe__vfs_link 805 1182 +377 (+46.83%)
pyperf180.bpf.o on_event 10564 17659 +7095 (+67.16%)
pyperf50.bpf.o on_event 2489 3375 +886 (+35.60%)
pyperf600_iter.bpf.o on_event 192 214 +22 (+11.46%)
pyperf_subprogs.bpf.o on_event 2331 2514 +183 (+7.85%)
setget_sockopt.bpf.o skops_sockopt 429 458 +29 (+6.76%)
setget_sockopt.bpf.o socket_post_create 90 95 +5 (+5.56%)
sock_iter_batch.bpf.o iter_tcp_soreuse 3 5 +2 (+66.67%)
strobemeta_bpf_loop.bpf.o on_event 209 331 +122 (+58.37%)
test_bpf_nf.bpf.o nf_skb_ct_test 41 56 +15 (+36.59%)
test_bpf_nf.bpf.o nf_xdp_ct_test 41 56 +15 (+36.59%)
test_cls_redirect.bpf.o cls_redirect 2175 14083 +11908 (+547.49%)
test_cls_redirect_dynptr.bpf.o cls_redirect 220 327 +107 (+48.64%)
test_cls_redirect_subprogs.bpf.o cls_redirect 4390 17001 +12611 (+287.27%)
test_l4lb.bpf.o balancer_ingress 137 256 +119 (+86.86%)
test_l4lb_noinline.bpf.o balancer_ingress 219 643 +424 (+193.61%)
test_l4lb_noinline_dynptr.bpf.o balancer_ingress 73 182 +109 (+149.32%)
test_misc_tcp_hdr_options.bpf.o misc_estab 88 98 +10 (+11.36%)
test_pkt_access.bpf.o test_pkt_access 21 25 +4 (+19.05%)
test_sock_fields.bpf.o egress_read_sock_fields 20 29 +9 (+45.00%)
test_tc_neigh_fib.bpf.o tc_dst 12 14 +2 (+16.67%)
test_tc_neigh_fib.bpf.o tc_src 12 14 +2 (+16.67%)
test_tcp_custom_syncookie.bpf.o tcp_custom_syncookie 420 560 +140 (+33.33%)
test_tcp_hdr_options.bpf.o estab 189 225 +36 (+19.05%)
test_xdp.bpf.o _xdp_tx_iptunnel 17 18 +1 (+5.88%)
test_xdp_dynptr.bpf.o _xdp_tx_iptunnel 26 36 +10 (+38.46%)
test_xdp_loop.bpf.o _xdp_tx_iptunnel 19 20 +1 (+5.26%)
test_xdp_noinline.bpf.o balancer_ingress_v4 271 1080 +809 (+298.52%)
test_xdp_noinline.bpf.o balancer_ingress_v6 268 1030 +762 (+284.33%)
xdp_features.bpf.o xdp_do_tx 10 13 +3 (+30.00%)
xdp_synproxy_kern.bpf.o syncookie_tc 390 467 +77 (+19.74%)
xdp_synproxy_kern.bpf.o syncookie_xdp 384 450 +66 (+17.19%)
Total progs: 3565
Old success: 2054
New success: 2054
States diff min: -9.09%
States diff max: 547.49%
-10% .. 0%: 3
0% .. 5%: 3492
5% .. 10%: 10
10% .. 15%: 8
15% .. 20%: 10
20% .. 25%: 6
25% .. 35%: 8
35% .. 40%: 4
45% .. 50%: 3
50% .. 55%: 4
55% .. 70%: 4
70% .. 90%: 3
95% .. 105%: 3
145% .. 195%: 3
280% .. 300%: 3
545% .. 550%: 1
========= scx: dfa-opts vs dfa-opts-no-rm-sl =========
File Program States (A) States (B) States (DIFF)
-------------- ------------------ ---------- ---------- ------------------
bpf.bpf.o bpfland_enqueue 18 20 +2 (+11.11%)
bpf.bpf.o bpfland_select_cpu 83 103 +20 (+24.10%)
bpf.bpf.o flash_select_cpu 30 49 +19 (+63.33%)
bpf.bpf.o lavd_cpu_offline 303 360 +57 (+18.81%)
bpf.bpf.o lavd_cpu_online 303 360 +57 (+18.81%)
bpf.bpf.o lavd_dispatch 7065 10652 +3587 (+50.77%)
bpf.bpf.o lavd_init 480 554 +74 (+15.42%)
bpf.bpf.o lavd_running 89 94 +5 (+5.62%)
bpf.bpf.o lavd_select_cpu 451 483 +32 (+7.10%)
bpf.bpf.o layered_dispatch 501 950 +449 (+89.62%)
bpf.bpf.o layered_dump 237 258 +21 (+8.86%)
bpf.bpf.o layered_enqueue 1290 1655 +365 (+28.29%)
bpf.bpf.o layered_init 423 552 +129 (+30.50%)
bpf.bpf.o layered_select_cpu 201 311 +110 (+54.73%)
bpf.bpf.o p2dq_dispatch 53 116 +63 (+118.87%)
bpf.bpf.o rusty_init 1944 55006 +53062 (+2729.53%)
bpf.bpf.o rusty_init_task 1732 55052 +53320 (+3078.52%)
bpf.bpf.o rusty_running 19 23 +4 (+21.05%)
bpf.bpf.o rusty_select_cpu 108 227 +119 (+110.19%)
bpf.bpf.o rusty_set_cpumask 313 479 +166 (+53.04%)
scx_nest.bpf.o nest_select_cpu 49 53 +4 (+8.16%)
Total progs: 216
Old success: 186
New success: 184
States diff min: 0.00%
States diff max: 3078.52%
0% .. 5%: 186
5% .. 10%: 4
10% .. 15%: 5
15% .. 20%: 6
20% .. 25%: 3
25% .. 55%: 6
60% .. 115%: 3
115% .. 3080%: 3
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
2025-03-01 4:40 ` Eduard Zingerman
@ 2025-03-02 0:09 ` Alexei Starovoitov
2025-03-03 19:28 ` Eduard Zingerman
2025-03-05 9:00 ` Eduard Zingerman
0 siblings, 2 replies; 11+ messages in thread
From: Alexei Starovoitov @ 2025-03-02 0:09 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kernel Team, Yonghong Song, Tejun Heo
On Fri, Feb 28, 2025 at 8:40 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2025-02-28 at 18:10 -0800, Alexei Starovoitov wrote:
>
> [...]
>
> > I think the end goal is to get rid of mark_reg_read() and
> > switch to proper live reg analysis.
> > So please include the numbers to see how much work left.
>
> Complete removal of mark_reg_read() means that analysis needs to be
> done for stack slots as well. The algorithm to handle stack slots is
> much more complicated:
> - it needs to track register / stack slot type to handle cases like
> "r1 = r10" and spills of the stack pointer to stack;
> - it needs to track register values, at-least crudely, to handle cases
> like "r1 = r10; r1 += r2;" (array access).
Doing this kind of register movement tracking before do_check()
may be difficult indeed.
Can we do this use/def tracking inline similar to current liveness,
but without ->parent logic.
Using postorder array that this patch adds ?
verifier states are path sensitive and more accurate
while this one will be insn based, but maybe good enough ?
> The worst case scenario, as you suggested, is just to assume stack
> slots live, but it is a big verification performance hit.
> Exact numbers are at the end of the email.
>
> > Also note that mark_reg_read() tracks 32 vs 64 reads separately.
> > iirc we did it to support fine grain mark_insn_zext
> > to help architectures where zext has to be inserted by JIT.
> > I'm not sure whether new liveness has to do it as well.
>
> As far as I understand, this is important for one check in
> propagate_liveness(). And that check means something like:
> "if this register was read as 64-bit value, remember that
> it needs zero extension on 32-bit load".
>
> Meaning that either DFA would need to track this bit of information
> (should be simple), or more zero extensions would be added.
Right. New liveness doesn't break zext, but makes it worse
for arch that needs it. We would need to quantify the impact.
iirc it was noticeable enough that we added this support.
>
> ---
>
> Repository [1] shared in cover letter was used for benchmarks below.
> Abbreviations are as follows:
> - Name: dfa-opts
> Commit: b73005452a4a
> Meaning: DFA as shared in this patch-set + a set of small
> improvements which I decided to exclude from the
> patch-set as described in the cover letter.
> - Name: dfa-opts-no-rm
> Commit: e486757fdada
> Meaning: dfa-opts + read marks are disabled for registers.
> - Name: dfa-opts-no-rm-sl
> Commit: a9930e8127a9
> Meaning: dfa-opts + read marks are disabled for registers
> and stack.
>
> [1] https://github.com/eddyz87/bpf/tree/liveregs-dfa-std-liveregs-off
>
> Veristat output is filtered using -f "states_pct>5" -f "!insns<200".
> Veristat results are followed by a histogram that accounts for all
> tests.
>
> Two comparisons are made:
> - dfa-opts vs dfa-opts-no-rm (small negative impact, except two
> sched_ext programs that hit 1M instructions limit; positive impact
> would have indicated a bug);
Let's figure out what is causing rusty_init[_task]
to explode.
And proceed with this set in parallel.
> - dfa-opts vs dfa-opts-no-rm-sl (big negative impact).
I don't read it as a big negative.
cls_redirect and balancer_ingress need to be understood,
but nothing exploded to hit 1M insns,
so hopefully bare minimum stack tracking would do the trick.
So in terms of priorities, let's land this set, then
figure out rusty_init,
figure out read32 vs 64 for zext,
at that time we may land -no-rm.
Then stack tracking.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
2025-03-02 0:09 ` Alexei Starovoitov
@ 2025-03-03 19:28 ` Eduard Zingerman
2025-03-05 9:00 ` Eduard Zingerman
1 sibling, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2025-03-03 19:28 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kernel Team, Yonghong Song, Tejun Heo
On Sat, 2025-03-01 at 16:09 -0800, Alexei Starovoitov wrote:
> On Fri, Feb 28, 2025 at 8:40 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
[...]
> > Complete removal of mark_reg_read() means that analysis needs to be
> > done for stack slots as well. The algorithm to handle stack slots is
> > much more complicated:
> > - it needs to track register / stack slot type to handle cases like
> > "r1 = r10" and spills of the stack pointer to stack;
> > - it needs to track register values, at-least crudely, to handle cases
> > like "r1 = r10; r1 += r2;" (array access).
>
> Doing this kind of register movement tracking before do_check()
> may be difficult indeed.
> Can we do this use/def tracking inline similar to current liveness,
> but without ->parent logic.
> Using postorder array that this patch adds ?
> verifier states are path sensitive and more accurate
> while this one will be insn based, but maybe good enough ?
You mean act like precision tracking? Whenever instruction is verified
and use is recorded propagate this use upwards in execution path,
updating live-in/live-out sets for instructions?
The problem here is termination (when to consider live-in/live-out
sets final). DFA computation stops as soon as live-in/live-out marks
stop changing. Idk how this condition should look for the scheme
above.
[...]
> > > Also note that mark_reg_read() tracks 32 vs 64 reads separately.
> > > iirc we did it to support fine grain mark_insn_zext
> > > to help architectures where zext has to be inserted by JIT.
> > > I'm not sure whether new liveness has to do it as well.
> >
> > As far as I understand, this is important for one check in
> > propagate_liveness(). And that check means something like:
> > "if this register was read as 64-bit value, remember that
> > it needs zero extension on 32-bit load".
> >
> > Meaning that either DFA would need to track this bit of information
> > (should be simple), or more zero extensions would be added.
>
> Right. New liveness doesn't break zext, but makes it worse
> for arch that needs it. We would need to quantify the impact.
> iirc it was noticeable enough that we added this support.
I'm surprised that no test_progs or test_verifier tests a broken.
Agree that this needs to be quantified.
[...]
> > Two comparisons are made:
> > - dfa-opts vs dfa-opts-no-rm (small negative impact, except two
> > sched_ext programs that hit 1M instructions limit; positive impact
> > would have indicated a bug);
>
> Let's figure out what is causing rusty_init[_task]
> to explode.
> And proceed with this set in parallel.
Will do.
> > - dfa-opts vs dfa-opts-no-rm-sl (big negative impact).
>
> I don't read it as a big negative.
> cls_redirect and balancer_ingress need to be understood,
> but nothing exploded to hit 1M insns,
> so hopefully bare minimum stack tracking would do the trick.
>
> So in terms of priorities, let's land this set, then
> figure out rusty_init,
> figure out read32 vs 64 for zext,
> at that time we may land -no-rm.
> Then stack tracking.
Tbh, I think that landing dfa-opts-no-rm separately from
dfa-opts-no-rm-sl doesn't make things much simpler.
The register chain based liveness computation would still be a thing.
I'd try to figure out how to make the dfa-opts-no-rm-sl variant faster
first.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
2025-03-02 0:09 ` Alexei Starovoitov
2025-03-03 19:28 ` Eduard Zingerman
@ 2025-03-05 9:00 ` Eduard Zingerman
1 sibling, 0 replies; 11+ messages in thread
From: Eduard Zingerman @ 2025-03-05 9:00 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kernel Team, Yonghong Song, Tejun Heo
On Sat, 2025-03-01 at 16:09 -0800, Alexei Starovoitov wrote:
> On Fri, Feb 28, 2025 at 8:40 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
[...]
> > Two comparisons are made:
> > - dfa-opts vs dfa-opts-no-rm (small negative impact, except two
> > sched_ext programs that hit 1M instructions limit; positive impact
> > would have indicated a bug);
>
> Let's figure out what is causing rusty_init[_task]
> to explode.
> And proceed with this set in parallel.
The regression for rusty_init was caused by incorrect mark of "r0" as
used because of "may_goto" instruction. This is fixed by:
https://lore.kernel.org/bpf/20250305085436.2731464-1-eddyz87@gmail.com/
> > - dfa-opts vs dfa-opts-no-rm-sl (big negative impact).
>
> I don't read it as a big negative.
> cls_redirect and balancer_ingress need to be understood,
> but nothing exploded to hit 1M insns,
> so hopefully bare minimum stack tracking would do the trick.
>
> So in terms of priorities, let's land this set, then
> figure out rusty_init,
> figure out read32 vs 64 for zext,
> at that time we may land -no-rm.
> Then stack tracking.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-03-05 9:00 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-28 6:00 [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Eduard Zingerman
2025-02-28 6:00 ` [PATCH bpf-next v1 1/3] " Eduard Zingerman
2025-03-01 2:01 ` Alexei Starovoitov
2025-03-01 2:09 ` Eduard Zingerman
2025-02-28 6:00 ` [PATCH bpf-next v1 2/3] bpf: use register liveness information for func_states_equal Eduard Zingerman
2025-02-28 6:00 ` [PATCH bpf-next v1 3/3] selftests/bpf: test cases for compute_live_registers() Eduard Zingerman
2025-03-01 2:10 ` [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Alexei Starovoitov
2025-03-01 4:40 ` Eduard Zingerman
2025-03-02 0:09 ` Alexei Starovoitov
2025-03-03 19:28 ` Eduard Zingerman
2025-03-05 9:00 ` Eduard Zingerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox