* [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills
@ 2023-10-31 5:03 Andrii Nakryiko
2023-10-31 5:03 ` [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states Andrii Nakryiko
` (6 more replies)
0 siblings, 7 replies; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31 5:03 UTC (permalink / raw)
To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team
Add support to BPF verifier to track and support register spill/fill to/from
stack regardless if it was done through read-only R10 register (which is the
only form supported today), or through a general register after copying R10
into it, while also potentially modifying offset.
Once we add register this generic spill/fill support to precision
backtracking, we can take advantage of it to stop doing eager STACK_ZERO
conversion on register spill. Instead we can rely on (im)precision of spilled
const zero register to improve verifier state pruning efficiency. This
situation of using const zero register to initialize stack slots is very
common with __builtin_memset() usage or just zero-initializing variables on
the stack, and it causes unnecessary state duplication, as that STACK_ZERO
knowledge is often not necessary for correctness, as those zero values are
never used in precise context. Thus, relying on register imprecision helps
tremendously, especially in real-world BPF programs.
To make spilled const zero register behave completely equivalently to
STACK_ZERO, we need to improve few other small pieces, which is done in the
second part of the patch set. See individual patches for details. There are
also two small bug fixes spotted during STACK_ZERO debugging.
Andrii Nakryiko (7):
bpf: use common jump (instruction) history across all states
bpf: support non-r10 register spill/fill to/from stack in precision
tracking
bpf: enforce precision for r0 on callback return
bpf: fix check for attempt to corrupt spilled pointer
bpf: preserve STACK_ZERO slots on partial reg spills
bpf: preserve constant zero when doing partial register restore
bpf: track aligned STACK_ZERO cases as imprecise spilled registers
include/linux/bpf_verifier.h | 34 ++-
kernel/bpf/verifier.c | 274 ++++++++++--------
.../bpf/progs/verifier_subprog_precision.c | 83 +++++-
.../testing/selftests/bpf/verifier/precise.c | 38 ++-
4 files changed, 285 insertions(+), 144 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
2023-10-31 5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
@ 2023-10-31 5:03 ` Andrii Nakryiko
2023-11-09 15:20 ` Eduard Zingerman
2023-10-31 5:03 ` [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking Andrii Nakryiko
` (5 subsequent siblings)
6 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31 5:03 UTC (permalink / raw)
To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team
Instead of allocating and copying jump history each time we enqueue
child verifier state, switch to a model where we use one common
dynamically sized array of instruction jumps across all states.
The key observation for proving this is correct is that jmp_history is
only relevant while state is active, which means it either is a current
state (and thus we are actively modifying jump history and no other
state can interfere with us) or we are checkpointed state with some
children still active (either enqueued or being current).
In the latter case our portion of jump history is finalized and won't
change or grow, so as long as we keep it immutable until the state is
finalized, we are good.
Now, when state is finalized and is put into state hash for potentially
future pruning lookups, jump history is not used anymore. This is
because jump history is only used by precision marking logic, and we
never modify precision markings for finalized states.
So, instead of each state having its own small jump history, we keep
a global dynamically-sized jump history, where each state in current DFS
path from root to active state remembers its portion of jump history.
Current state can append to this history, but cannot modify any of its
parent histories.
Because the jmp_history array can be grown through realloc, states don't
keep pointers, they instead maintain two indexes [start, end) into
global jump history array. End is exclusive index, so start == end means
there is no relevant jump history.
This should eliminate a lot of allocations and minimize overall memory
usage (but I haven't benchmarked on real hardware, and QEMU benchmarking
is too noisy).
Also, in the next patch we'll extend jump history to maintain additional
markings for some instructions even if there was no jump, so in
preparation for that call this thing a more generic "instruction history".
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
include/linux/bpf_verifier.h | 8 +++--
kernel/bpf/verifier.c | 68 ++++++++++++++++--------------------
2 files changed, 35 insertions(+), 41 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 24213a99cc79..b57696145111 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -309,7 +309,7 @@ struct bpf_func_state {
struct bpf_stack_state *stack;
};
-struct bpf_idx_pair {
+struct bpf_insn_hist_entry {
u32 prev_idx;
u32 idx;
};
@@ -397,8 +397,8 @@ struct bpf_verifier_state {
* For most states jmp_history_cnt is [0-3].
* For loops can go up to ~40.
*/
- struct bpf_idx_pair *jmp_history;
- u32 jmp_history_cnt;
+ u32 insn_hist_start;
+ u32 insn_hist_end;
u32 dfs_depth;
};
@@ -666,6 +666,8 @@ struct bpf_verifier_env {
* e.g., in reg_type_str() to generate reg_type string
*/
char tmp_str_buf[TMP_STR_BUF_LEN];
+ struct bpf_insn_hist_entry *insn_hist;
+ u32 insn_hist_cap;
};
__printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 857d76694517..2905ce2e8b34 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1737,13 +1737,6 @@ static void free_func_state(struct bpf_func_state *state)
kfree(state);
}
-static void clear_jmp_history(struct bpf_verifier_state *state)
-{
- kfree(state->jmp_history);
- state->jmp_history = NULL;
- state->jmp_history_cnt = 0;
-}
-
static void free_verifier_state(struct bpf_verifier_state *state,
bool free_self)
{
@@ -1753,7 +1746,6 @@ static void free_verifier_state(struct bpf_verifier_state *state,
free_func_state(state->frame[i]);
state->frame[i] = NULL;
}
- clear_jmp_history(state);
if (free_self)
kfree(state);
}
@@ -1779,13 +1771,6 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
struct bpf_func_state *dst;
int i, err;
- dst_state->jmp_history = copy_array(dst_state->jmp_history, src->jmp_history,
- src->jmp_history_cnt, sizeof(struct bpf_idx_pair),
- GFP_USER);
- if (!dst_state->jmp_history)
- return -ENOMEM;
- dst_state->jmp_history_cnt = src->jmp_history_cnt;
-
/* if dst has more stack frames then src frame, free them, this is also
* necessary in case of exceptional exits using bpf_throw.
*/
@@ -1802,6 +1787,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
dst_state->parent = src->parent;
dst_state->first_insn_idx = src->first_insn_idx;
dst_state->last_insn_idx = src->last_insn_idx;
+ dst_state->insn_hist_start = src->insn_hist_start;
+ dst_state->insn_hist_end = src->insn_hist_end;
dst_state->dfs_depth = src->dfs_depth;
dst_state->used_as_loop_entry = src->used_as_loop_entry;
for (i = 0; i <= src->curframe; i++) {
@@ -3495,40 +3482,44 @@ static bool is_jmp_point(struct bpf_verifier_env *env, int insn_idx)
static int push_jmp_history(struct bpf_verifier_env *env,
struct bpf_verifier_state *cur)
{
- u32 cnt = cur->jmp_history_cnt;
- struct bpf_idx_pair *p;
+ struct bpf_insn_hist_entry *p;
size_t alloc_size;
if (!is_jmp_point(env, env->insn_idx))
return 0;
- cnt++;
- alloc_size = kmalloc_size_roundup(size_mul(cnt, sizeof(*p)));
- p = krealloc(cur->jmp_history, alloc_size, GFP_USER);
- if (!p)
- return -ENOMEM;
- p[cnt - 1].idx = env->insn_idx;
- p[cnt - 1].prev_idx = env->prev_insn_idx;
- cur->jmp_history = p;
- cur->jmp_history_cnt = cnt;
+ if (cur->insn_hist_end + 1 > env->insn_hist_cap) {
+ alloc_size = size_mul(cur->insn_hist_end + 1, sizeof(*p));
+ alloc_size = kmalloc_size_roundup(alloc_size);
+ p = krealloc(env->insn_hist, alloc_size, GFP_USER);
+ if (!p)
+ return -ENOMEM;
+ env->insn_hist = p;
+ env->insn_hist_cap = alloc_size / sizeof(*p);
+ }
+
+ p = &env->insn_hist[cur->insn_hist_end];
+ p->idx = env->insn_idx;
+ p->prev_idx = env->prev_insn_idx;
+ cur->insn_hist_end++;
return 0;
}
/* Backtrack one insn at a time. If idx is not at the top of recorded
* history then previous instruction came from straight line execution.
*/
-static int get_prev_insn_idx(struct bpf_verifier_state *st, int i,
- u32 *history)
+static int get_prev_insn_idx(const struct bpf_verifier_env *env, int insn_idx,
+ u32 hist_start, u32 *hist_endp)
{
- u32 cnt = *history;
+ u32 hist_end = *hist_endp;
- if (cnt && st->jmp_history[cnt - 1].idx == i) {
- i = st->jmp_history[cnt - 1].prev_idx;
- (*history)--;
+ if (hist_end > hist_start && env->insn_hist[hist_end - 1].idx == insn_idx) {
+ insn_idx = env->insn_hist[hist_end - 1].prev_idx;
+ (*hist_endp)--;
} else {
- i--;
+ insn_idx--;
}
- return i;
+ return insn_idx;
}
static const char *disasm_kfunc_name(void *data, const struct bpf_insn *insn)
@@ -4200,7 +4191,7 @@ static int mark_precise_scalar_ids(struct bpf_verifier_env *env, struct bpf_veri
* SCALARS, as well as any other registers and slots that contribute to
* a tracked state of given registers/stack slots, depending on specific BPF
* assembly instructions (see backtrack_insns() for exact instruction handling
- * logic). This backtracking relies on recorded jmp_history and is able to
+ * logic). This backtracking relies on recorded insn_history and is able to
* traverse entire chain of parent states. This process ends only when all the
* necessary registers/slots and their transitive dependencies are marked as
* precise.
@@ -4317,7 +4308,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
for (;;) {
DECLARE_BITMAP(mask, 64);
- u32 history = st->jmp_history_cnt;
+ u32 hist_end = st->insn_hist_end;
if (env->log.level & BPF_LOG_LEVEL2) {
verbose(env, "mark_precise: frame%d: last_idx %d first_idx %d subseq_idx %d \n",
@@ -4399,7 +4390,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
if (i == first_idx)
break;
subseq_idx = i;
- i = get_prev_insn_idx(st, i, &history);
+ i = get_prev_insn_idx(env, i, st->insn_hist_start, &hist_end);
if (i >= env->prog->len) {
/* This can happen if backtracking reached insn 0
* and there are still reg_mask or stack_mask
@@ -17109,8 +17100,8 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
cur->parent = new;
cur->first_insn_idx = insn_idx;
+ cur->insn_hist_start = cur->insn_hist_end;
cur->dfs_depth = new->dfs_depth + 1;
- clear_jmp_history(cur);
new_sl->next = *explored_state(env, insn_idx);
*explored_state(env, insn_idx) = new_sl;
/* connect new state to parentage chain. Current frame needs all
@@ -20807,6 +20798,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
if (!is_priv)
mutex_unlock(&bpf_verifier_lock);
vfree(env->insn_aux_data);
+ kvfree(env->insn_hist);
err_free_env:
kfree(env);
return ret;
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking
2023-10-31 5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
2023-10-31 5:03 ` [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states Andrii Nakryiko
@ 2023-10-31 5:03 ` Andrii Nakryiko
2023-11-09 15:20 ` Eduard Zingerman
2023-10-31 5:03 ` [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return Andrii Nakryiko
` (4 subsequent siblings)
6 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31 5:03 UTC (permalink / raw)
To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Tao Lyu
Use newly optimized instruction history to record instructions that
performed register spill/fill to/from stack, regardless if this was done
through read-only r10 register, or any other register after copying r10
into it *and* potentially adjusting offset.
To make this work reliably, we push extra per-instruction flags into
instruction history, encoding stack slot index (spi) and stack frame
number in extra 10 bit flags we take away from prev_idx in instruction
history. We don't touch idx field for maximum performance, as it's
checked most frequently during backtracking.
This change removes basically the last remaining practical limitation of
precision backtracking logic in BPF verifier. It fixes known
deficiencies, but also opens up new opportunities to reduce number of
verified states, explored in the next patch.
There are only three differences in selftests' BPF object files
according to veristat, all in the positive direction (less states).
File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
-------------------------------------- ------------- --------- --------- ------------- ---------- ---------- -------------
test_cls_redirect_dynptr.bpf.linked3.o cls_redirect 2987 2864 -123 (-4.12%) 240 231 -9 (-3.75%)
xdp_synproxy_kern.bpf.linked3.o syncookie_tc 82848 82661 -187 (-0.23%) 5107 5073 -34 (-0.67%)
xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 85116 84964 -152 (-0.18%) 5162 5130 -32 (-0.62%)
Reported-by: Tao Lyu <tao.lyu@epfl.ch>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
include/linux/bpf_verifier.h | 26 +++-
kernel/bpf/verifier.c | 145 +++++++++---------
.../bpf/progs/verifier_subprog_precision.c | 83 +++++++++-
.../testing/selftests/bpf/verifier/precise.c | 38 +++--
4 files changed, 197 insertions(+), 95 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index b57696145111..7940c0861198 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -309,12 +309,34 @@ struct bpf_func_state {
struct bpf_stack_state *stack;
};
+#define MAX_CALL_FRAMES 8
+
+/* instruction history flags, used in bpf_insn_hist_entry.flags field */
+enum {
+ /* instruction references stack slot through PTR_TO_STACK register;
+ * we also store stack's frame number in lower 3 bits (MAX_CALL_FRAMES is 8)
+ * and accessed stack slot's index in next 6 bits (MAX_BPF_STACK is 512,
+ * 8 bytes per slot, so slot index (spi) is [0, 63])
+ */
+ INSN_F_FRAMENO_MASK = 0x7, /* 3 bits */
+
+ INSN_F_SPI_MASK = 0x3f, /* 6 bits */
+ INSN_F_SPI_SHIFT = 3, /* shifted 3 bits to the left */
+
+ INSN_F_STACK_ACCESS = BIT(9), /* we need 10 bits total */
+};
+
+static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES);
+static_assert(INSN_F_SPI_MASK + 1 >= MAX_BPF_STACK / 8);
+
struct bpf_insn_hist_entry {
- u32 prev_idx;
u32 idx;
+ /* insn idx can't be bigger than 1 million */
+ u32 prev_idx : 22;
+ /* special flags, e.g., whether insn is doing register stack spill/load */
+ u32 flags : 10;
};
-#define MAX_CALL_FRAMES 8
/* Maximum number of register states that can exist at once */
#define BPF_ID_MAP_SIZE ((MAX_BPF_REG + MAX_BPF_STACK / BPF_REG_SIZE) * MAX_CALL_FRAMES)
struct bpf_verifier_state {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2905ce2e8b34..fbb779583d52 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3479,14 +3479,20 @@ static bool is_jmp_point(struct bpf_verifier_env *env, int insn_idx)
}
/* for any branch, call, exit record the history of jmps in the given state */
-static int push_jmp_history(struct bpf_verifier_env *env,
- struct bpf_verifier_state *cur)
+static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
+ int insn_flags)
{
struct bpf_insn_hist_entry *p;
size_t alloc_size;
- if (!is_jmp_point(env, env->insn_idx))
+ /* combine instruction flags if we already recorded this instruction */
+ if (cur->insn_hist_end > cur->insn_hist_start &&
+ (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
+ p->idx == env->insn_idx &&
+ p->prev_idx == env->prev_insn_idx) {
+ p->flags |= insn_flags;
return 0;
+ }
if (cur->insn_hist_end + 1 > env->insn_hist_cap) {
alloc_size = size_mul(cur->insn_hist_end + 1, sizeof(*p));
@@ -3501,14 +3507,23 @@ static int push_jmp_history(struct bpf_verifier_env *env,
p = &env->insn_hist[cur->insn_hist_end];
p->idx = env->insn_idx;
p->prev_idx = env->prev_insn_idx;
+ p->flags = insn_flags;
cur->insn_hist_end++;
return 0;
}
+static struct bpf_insn_hist_entry *get_hist_insn_entry(struct bpf_verifier_env *env,
+ u32 hist_start, u32 hist_end, int insn_idx)
+{
+ if (hist_end > hist_start && env->insn_hist[hist_end - 1].idx == insn_idx)
+ return &env->insn_hist[hist_end - 1];
+ return NULL;
+}
+
/* Backtrack one insn at a time. If idx is not at the top of recorded
* history then previous instruction came from straight line execution.
*/
-static int get_prev_insn_idx(const struct bpf_verifier_env *env, int insn_idx,
+static int get_prev_insn_idx(struct bpf_verifier_env *env, int insn_idx,
u32 hist_start, u32 *hist_endp)
{
u32 hist_end = *hist_endp;
@@ -3649,9 +3664,14 @@ static inline bool bt_is_reg_set(struct backtrack_state *bt, u32 reg)
return bt->reg_masks[bt->frame] & (1 << reg);
}
+static inline bool bt_is_frame_slot_set(struct backtrack_state *bt, u32 frame, u32 slot)
+{
+ return bt->stack_masks[frame] & (1ull << slot);
+}
+
static inline bool bt_is_slot_set(struct backtrack_state *bt, u32 slot)
{
- return bt->stack_masks[bt->frame] & (1ull << slot);
+ return bt_is_frame_slot_set(bt, bt->frame, slot);
}
/* format registers bitmask, e.g., "r0,r2,r4" for 0x15 mask */
@@ -3703,7 +3723,7 @@ static void fmt_stack_mask(char *buf, ssize_t buf_sz, u64 stack_mask)
* - *was* processed previously during backtracking.
*/
static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
- struct backtrack_state *bt)
+ struct bpf_insn_hist_entry *hist, struct backtrack_state *bt)
{
const struct bpf_insn_cbs cbs = {
.cb_call = disasm_kfunc_name,
@@ -3716,7 +3736,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
u8 mode = BPF_MODE(insn->code);
u32 dreg = insn->dst_reg;
u32 sreg = insn->src_reg;
- u32 spi, i;
+ u32 spi, i, fr;
if (insn->code == 0)
return 0;
@@ -3772,20 +3792,15 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
* by 'precise' mark in corresponding register of this state.
* No further tracking necessary.
*/
- if (insn->src_reg != BPF_REG_FP)
+ if (!hist || !(hist->flags & INSN_F_STACK_ACCESS))
return 0;
-
/* dreg = *(u64 *)[fp - off] was a fill from the stack.
* that [fp - off] slot contains scalar that needs to be
* tracked with precision
*/
- spi = (-insn->off - 1) / BPF_REG_SIZE;
- if (spi >= 64) {
- verbose(env, "BUG spi %d\n", spi);
- WARN_ONCE(1, "verifier backtracking bug");
- return -EFAULT;
- }
- bt_set_slot(bt, spi);
+ spi = (hist->flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK;
+ fr = hist->flags & INSN_F_FRAMENO_MASK;
+ bt_set_frame_slot(bt, fr, spi);
} else if (class == BPF_STX || class == BPF_ST) {
if (bt_is_reg_set(bt, dreg))
/* stx & st shouldn't be using _scalar_ dst_reg
@@ -3794,17 +3809,13 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
*/
return -ENOTSUPP;
/* scalars can only be spilled into stack */
- if (insn->dst_reg != BPF_REG_FP)
+ if (!hist || !(hist->flags & INSN_F_STACK_ACCESS))
return 0;
- spi = (-insn->off - 1) / BPF_REG_SIZE;
- if (spi >= 64) {
- verbose(env, "BUG spi %d\n", spi);
- WARN_ONCE(1, "verifier backtracking bug");
- return -EFAULT;
- }
- if (!bt_is_slot_set(bt, spi))
+ spi = (hist->flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK;
+ fr = hist->flags & INSN_F_FRAMENO_MASK;
+ if (!bt_is_frame_slot_set(bt, fr, spi))
return 0;
- bt_clear_slot(bt, spi);
+ bt_clear_frame_slot(bt, fr, spi);
if (class == BPF_STX)
bt_set_reg(bt, sreg);
} else if (class == BPF_JMP || class == BPF_JMP32) {
@@ -3848,10 +3859,14 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
WARN_ONCE(1, "verifier backtracking bug");
return -EFAULT;
}
- /* we don't track register spills perfectly,
- * so fallback to force-precise instead of failing */
- if (bt_stack_mask(bt) != 0)
- return -ENOTSUPP;
+ /* we are now tracking register spills correctly,
+ * so any instance of leftover slots is a bug
+ */
+ if (bt_stack_mask(bt) != 0) {
+ verbose(env, "BUG stack slots %llx\n", bt_stack_mask(bt));
+ WARN_ONCE(1, "verifier backtracking bug (subprog leftover stack slots)");
+ return -EFAULT;
+ }
/* propagate r1-r5 to the caller */
for (i = BPF_REG_1; i <= BPF_REG_5; i++) {
if (bt_is_reg_set(bt, i)) {
@@ -3879,8 +3894,11 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
WARN_ONCE(1, "verifier backtracking bug");
return -EFAULT;
}
- if (bt_stack_mask(bt) != 0)
- return -ENOTSUPP;
+ if (bt_stack_mask(bt) != 0) {
+ verbose(env, "BUG stack slots %llx\n", bt_stack_mask(bt));
+ WARN_ONCE(1, "verifier backtracking bug (callback leftover stack slots)");
+ return -EFAULT;
+ }
/* clear r1-r5 in callback subprog's mask */
for (i = BPF_REG_1; i <= BPF_REG_5; i++)
bt_clear_reg(bt, i);
@@ -4308,7 +4326,8 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
for (;;) {
DECLARE_BITMAP(mask, 64);
- u32 hist_end = st->insn_hist_end;
+ u32 hist_start = st->insn_hist_start, hist_end = st->insn_hist_end;
+ struct bpf_insn_hist_entry *hist;
if (env->log.level & BPF_LOG_LEVEL2) {
verbose(env, "mark_precise: frame%d: last_idx %d first_idx %d subseq_idx %d \n",
@@ -4372,7 +4391,8 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
err = 0;
skip_first = false;
} else {
- err = backtrack_insn(env, i, subseq_idx, bt);
+ hist = get_hist_insn_entry(env, hist_start, hist_end, i);
+ err = backtrack_insn(env, i, subseq_idx, hist, bt);
}
if (err == -ENOTSUPP) {
mark_all_scalars_precise(env, env->cur_state);
@@ -4390,7 +4410,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
if (i == first_idx)
break;
subseq_idx = i;
- i = get_prev_insn_idx(env, i, st->insn_hist_start, &hist_end);
+ i = get_prev_insn_idx(env, i, hist_start, &hist_end);
if (i >= env->prog->len) {
/* This can happen if backtracking reached insn 0
* and there are still reg_mask or stack_mask
@@ -4425,22 +4445,10 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno)
bitmap_from_u64(mask, bt_frame_stack_mask(bt, fr));
for_each_set_bit(i, mask, 64) {
if (i >= func->allocated_stack / BPF_REG_SIZE) {
- /* the sequence of instructions:
- * 2: (bf) r3 = r10
- * 3: (7b) *(u64 *)(r3 -8) = r0
- * 4: (79) r4 = *(u64 *)(r10 -8)
- * doesn't contain jmps. It's backtracked
- * as a single block.
- * During backtracking insn 3 is not recognized as
- * stack access, so at the end of backtracking
- * stack slot fp-8 is still marked in stack_mask.
- * However the parent state may not have accessed
- * fp-8 and it's "unallocated" stack space.
- * In such case fallback to conservative.
- */
- mark_all_scalars_precise(env, env->cur_state);
- bt_reset(bt);
- return 0;
+ verbose(env, "BUG backtracking (stack slot %d, total slots %d)\n",
+ i, func->allocated_stack / BPF_REG_SIZE);
+ WARN_ONCE(1, "verifier backtracking bug (stack slot out of bounds)");
+ return -EFAULT;
}
if (!is_spilled_scalar_reg(&func->stack[i])) {
@@ -4605,7 +4613,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
int i, slot = -off - 1, spi = slot / BPF_REG_SIZE, err;
struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
struct bpf_reg_state *reg = NULL;
- u32 dst_reg = insn->dst_reg;
+ int insn_flags = INSN_F_STACK_ACCESS | (spi << INSN_F_SPI_SHIFT) | state->frameno;
err = grow_stack_state(state, round_up(slot + 1, BPF_REG_SIZE));
if (err)
@@ -4646,17 +4654,6 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
mark_stack_slot_scratched(env, spi);
if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
!register_is_null(reg) && env->bpf_capable) {
- if (dst_reg != BPF_REG_FP) {
- /* The backtracking logic can only recognize explicit
- * stack slot address like [fp - 8]. Other spill of
- * scalar via different register has to be conservative.
- * Backtrack from here and mark all registers as precise
- * that contributed into 'reg' being a constant.
- */
- err = mark_chain_precision(env, value_regno);
- if (err)
- return err;
- }
save_register_state(state, spi, reg, size);
/* Break the relation on a narrowing spill. */
if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
@@ -4668,6 +4665,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
__mark_reg_known(&fake_reg, (u32)insn->imm);
fake_reg.type = SCALAR_VALUE;
save_register_state(state, spi, &fake_reg, size);
+ insn_flags = 0; /* not a register spill */
} else if (reg && is_spillable_regtype(reg->type)) {
/* register containing pointer is being spilled into stack */
if (size != BPF_REG_SIZE) {
@@ -4713,9 +4711,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
/* Mark slots affected by this stack write. */
for (i = 0; i < size; i++)
- state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] =
- type;
+ state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = type;
+ insn_flags = 0; /* not a register spill */
}
+
+ if (insn_flags)
+ return push_insn_history(env, env->cur_state, insn_flags);
return 0;
}
@@ -4908,6 +4909,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
int i, slot = -off - 1, spi = slot / BPF_REG_SIZE;
struct bpf_reg_state *reg;
u8 *stype, type;
+ int insn_flags = INSN_F_STACK_ACCESS | (spi << INSN_F_SPI_SHIFT) | reg_state->frameno;
stype = reg_state->stack[spi].slot_type;
reg = ®_state->stack[spi].spilled_ptr;
@@ -4953,12 +4955,10 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
return -EACCES;
}
mark_reg_unknown(env, state->regs, dst_regno);
+ insn_flags = 0; /* not restoring original register state */
}
state->regs[dst_regno].live |= REG_LIVE_WRITTEN;
- return 0;
- }
-
- if (dst_regno >= 0) {
+ } else if (dst_regno >= 0) {
/* restore register state from stack */
copy_register_state(&state->regs[dst_regno], reg);
/* mark reg as written since spilled pointer state likely
@@ -4994,7 +4994,10 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
mark_reg_read(env, reg, reg->parent, REG_LIVE_READ64);
if (dst_regno >= 0)
mark_reg_stack_read(env, reg_state, off, off + size, dst_regno);
+ insn_flags = 0; /* we are not restoring spilled register */
}
+ if (insn_flags)
+ return push_insn_history(env, env->cur_state, insn_flags);
return 0;
}
@@ -7125,7 +7128,6 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
BPF_SIZE(insn->code), BPF_WRITE, -1, true, false);
if (err)
return err;
-
return 0;
}
@@ -17001,7 +17003,8 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
* the precision needs to be propagated back in
* the current state.
*/
- err = err ? : push_jmp_history(env, cur);
+ if (is_jmp_point(env, env->insn_idx))
+ err = err ? : push_insn_history(env, cur, 0);
err = err ? : propagate_precision(env, &sl->state);
if (err)
return err;
@@ -17265,7 +17268,7 @@ static int do_check(struct bpf_verifier_env *env)
}
if (is_jmp_point(env, env->insn_idx)) {
- err = push_jmp_history(env, state);
+ err = push_insn_history(env, state, 0);
if (err)
return err;
}
diff --git a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
index db6b3143338b..88c4207c6b4c 100644
--- a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
+++ b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
@@ -487,7 +487,24 @@ __success __log_level(2)
* so we won't be able to mark stack slot fp-8 as precise, and so will
* fallback to forcing all as precise
*/
-__msg("mark_precise: frame0: falling back to forcing all scalars precise")
+__msg("10: (0f) r1 += r7")
+__msg("mark_precise: frame0: last_idx 10 first_idx 7 subseq_idx -1")
+__msg("mark_precise: frame0: regs=r7 stack= before 9: (bf) r1 = r8")
+__msg("mark_precise: frame0: regs=r7 stack= before 8: (27) r7 *= 4")
+__msg("mark_precise: frame0: regs=r7 stack= before 7: (79) r7 = *(u64 *)(r10 -8)")
+__msg("mark_precise: frame0: parent state regs= stack=-8: R0_w=2 R6_w=1 R8_rw=map_value(off=0,ks=4,vs=16,imm=0) R10=fp0 fp-8_rw=P1")
+__msg("mark_precise: frame0: last_idx 18 first_idx 0 subseq_idx 7")
+__msg("mark_precise: frame0: regs= stack=-8 before 18: (95) exit")
+__msg("mark_precise: frame1: regs= stack= before 17: (0f) r0 += r2")
+__msg("mark_precise: frame1: regs= stack= before 16: (79) r2 = *(u64 *)(r1 +0)")
+__msg("mark_precise: frame1: regs= stack= before 15: (79) r0 = *(u64 *)(r10 -16)")
+__msg("mark_precise: frame1: regs= stack= before 14: (7b) *(u64 *)(r10 -16) = r2")
+__msg("mark_precise: frame1: regs= stack= before 13: (7b) *(u64 *)(r1 +0) = r2")
+__msg("mark_precise: frame1: regs=r2 stack= before 6: (85) call pc+6")
+__msg("mark_precise: frame0: regs=r2 stack= before 5: (bf) r2 = r6")
+__msg("mark_precise: frame0: regs=r6 stack= before 4: (07) r1 += -8")
+__msg("mark_precise: frame0: regs=r6 stack= before 3: (bf) r1 = r10")
+__msg("mark_precise: frame0: regs=r6 stack= before 2: (b7) r6 = 1")
__naked int subprog_spill_into_parent_stack_slot_precise(void)
{
asm volatile (
@@ -522,14 +539,68 @@ __naked int subprog_spill_into_parent_stack_slot_precise(void)
);
}
-__naked __noinline __used
-static __u64 subprog_with_checkpoint(void)
+SEC("?raw_tp")
+__success __log_level(2)
+__msg("17: (0f) r1 += r0")
+__msg("mark_precise: frame0: last_idx 17 first_idx 0 subseq_idx -1")
+__msg("mark_precise: frame0: regs=r0 stack= before 16: (bf) r1 = r7")
+__msg("mark_precise: frame0: regs=r0 stack= before 15: (27) r0 *= 4")
+__msg("mark_precise: frame0: regs=r0 stack= before 14: (79) r0 = *(u64 *)(r10 -16)")
+__msg("mark_precise: frame0: regs= stack=-16 before 13: (7b) *(u64 *)(r7 -8) = r0")
+__msg("mark_precise: frame0: regs=r0 stack= before 12: (79) r0 = *(u64 *)(r8 +16)")
+__msg("mark_precise: frame0: regs= stack=-16 before 11: (7b) *(u64 *)(r8 +16) = r0")
+__msg("mark_precise: frame0: regs=r0 stack= before 10: (79) r0 = *(u64 *)(r7 -8)")
+__msg("mark_precise: frame0: regs= stack=-16 before 9: (7b) *(u64 *)(r10 -16) = r0")
+__msg("mark_precise: frame0: regs=r0 stack= before 8: (07) r8 += -32")
+__msg("mark_precise: frame0: regs=r0 stack= before 7: (bf) r8 = r10")
+__msg("mark_precise: frame0: regs=r0 stack= before 6: (07) r7 += -8")
+__msg("mark_precise: frame0: regs=r0 stack= before 5: (bf) r7 = r10")
+__msg("mark_precise: frame0: regs=r0 stack= before 21: (95) exit")
+__msg("mark_precise: frame1: regs=r0 stack= before 20: (bf) r0 = r1")
+__msg("mark_precise: frame1: regs=r1 stack= before 4: (85) call pc+15")
+__msg("mark_precise: frame0: regs=r1 stack= before 3: (bf) r1 = r6")
+__msg("mark_precise: frame0: regs=r6 stack= before 2: (b7) r6 = 1")
+__naked int stack_slot_aliases_precision(void)
{
asm volatile (
- "r0 = 0;"
- /* guaranteed checkpoint if BPF_F_TEST_STATE_FREQ is used */
- "goto +0;"
+ "r6 = 1;"
+ /* pass r6 through r1 into subprog to get it back as r0;
+ * this whole chain will have to be marked as precise later
+ */
+ "r1 = r6;"
+ "call identity_subprog;"
+ /* let's setup two registers that are aliased to r10 */
+ "r7 = r10;"
+ "r7 += -8;" /* r7 = r10 - 8 */
+ "r8 = r10;"
+ "r8 += -32;" /* r8 = r10 - 32 */
+ /* now spill subprog's return value (a r6 -> r1 -> r0 chain)
+ * a few times through different stack pointer regs, making
+ * sure to use r10, r7, and r8 both in LDX and STX insns, and
+ * *importantly* also using a combination of const var_off and
+ * insn->off to validate that we record final stack slot
+ * correctly, instead of relying on just insn->off derivation,
+ * which is only valid for r10-based stack offset
+ */
+ "*(u64 *)(r10 - 16) = r0;"
+ "r0 = *(u64 *)(r7 - 8);" /* r7 - 8 == r10 - 16 */
+ "*(u64 *)(r8 + 16) = r0;" /* r8 + 16 = r10 - 16 */
+ "r0 = *(u64 *)(r8 + 16);"
+ "*(u64 *)(r7 - 8) = r0;"
+ "r0 = *(u64 *)(r10 - 16);"
+ /* get ready to use r0 as an index into array to force precision */
+ "r0 *= 4;"
+ "r1 = %[vals];"
+ /* here r0->r1->r6 chain is forced to be precise and has to be
+ * propagated back to the beginning, including through the
+ * subprog call and all the stack spills and loads
+ */
+ "r1 += r0;"
+ "r0 = *(u32 *)(r1 + 0);"
"exit;"
+ :
+ : __imm_ptr(vals)
+ : __clobber_common, "r6"
);
}
diff --git a/tools/testing/selftests/bpf/verifier/precise.c b/tools/testing/selftests/bpf/verifier/precise.c
index 0d84dd1f38b6..8a2ff81d8350 100644
--- a/tools/testing/selftests/bpf/verifier/precise.c
+++ b/tools/testing/selftests/bpf/verifier/precise.c
@@ -140,10 +140,11 @@
.result = REJECT,
},
{
- "precise: ST insn causing spi > allocated_stack",
+ "precise: ST zero to stack insn is supported",
.insns = {
BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 123, 0),
+ /* not a register spill, so we stop precision propagation for R4 here */
BPF_ST_MEM(BPF_DW, BPF_REG_3, -8, 0),
BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8),
BPF_MOV64_IMM(BPF_REG_0, -1),
@@ -157,11 +158,11 @@
mark_precise: frame0: last_idx 4 first_idx 2\
mark_precise: frame0: regs=r4 stack= before 4\
mark_precise: frame0: regs=r4 stack= before 3\
- mark_precise: frame0: regs= stack=-8 before 2\
- mark_precise: frame0: falling back to forcing all scalars precise\
- force_precise: frame0: forcing r0 to be precise\
mark_precise: frame0: last_idx 5 first_idx 5\
- mark_precise: frame0: parent state regs= stack=:",
+ mark_precise: frame0: parent state regs=r0 stack=:\
+ mark_precise: frame0: last_idx 4 first_idx 2\
+ mark_precise: frame0: regs=r0 stack= before 4\
+ 5: R0=-1 R4=0",
.result = VERBOSE_ACCEPT,
.retval = -1,
},
@@ -169,6 +170,8 @@
"precise: STX insn causing spi > allocated_stack",
.insns = {
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+ /* make later reg spill more interesting by having somewhat known scalar */
+ BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 0xff),
BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 123, 0),
BPF_STX_MEM(BPF_DW, BPF_REG_3, BPF_REG_0, -8),
@@ -179,18 +182,21 @@
},
.prog_type = BPF_PROG_TYPE_XDP,
.flags = BPF_F_TEST_STATE_FREQ,
- .errstr = "mark_precise: frame0: last_idx 6 first_idx 6\
+ .errstr = "mark_precise: frame0: last_idx 7 first_idx 7\
mark_precise: frame0: parent state regs=r4 stack=:\
- mark_precise: frame0: last_idx 5 first_idx 3\
- mark_precise: frame0: regs=r4 stack= before 5\
- mark_precise: frame0: regs=r4 stack= before 4\
- mark_precise: frame0: regs= stack=-8 before 3\
- mark_precise: frame0: falling back to forcing all scalars precise\
- force_precise: frame0: forcing r0 to be precise\
- force_precise: frame0: forcing r0 to be precise\
- force_precise: frame0: forcing r0 to be precise\
- force_precise: frame0: forcing r0 to be precise\
- mark_precise: frame0: last_idx 6 first_idx 6\
+ mark_precise: frame0: last_idx 6 first_idx 4\
+ mark_precise: frame0: regs=r4 stack= before 6: (b7) r0 = -1\
+ mark_precise: frame0: regs=r4 stack= before 5: (79) r4 = *(u64 *)(r10 -8)\
+ mark_precise: frame0: regs= stack=-8 before 4: (7b) *(u64 *)(r3 -8) = r0\
+ mark_precise: frame0: parent state regs=r0 stack=:\
+ mark_precise: frame0: last_idx 3 first_idx 3\
+ mark_precise: frame0: regs=r0 stack= before 3: (55) if r3 != 0x7b goto pc+0\
+ mark_precise: frame0: regs=r0 stack= before 2: (bf) r3 = r10\
+ mark_precise: frame0: regs=r0 stack= before 1: (57) r0 &= 255\
+ mark_precise: frame0: parent state regs=r0 stack=:\
+ mark_precise: frame0: last_idx 0 first_idx 0\
+ mark_precise: frame0: regs=r0 stack= before 0: (85) call bpf_get_prandom_u32#7\
+ mark_precise: frame0: last_idx 7 first_idx 7\
mark_precise: frame0: parent state regs= stack=:",
.result = VERBOSE_ACCEPT,
.retval = -1,
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
2023-10-31 5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
2023-10-31 5:03 ` [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states Andrii Nakryiko
2023-10-31 5:03 ` [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking Andrii Nakryiko
@ 2023-10-31 5:03 ` Andrii Nakryiko
2023-11-09 15:20 ` Eduard Zingerman
2023-10-31 5:03 ` [PATCH bpf-next 4/7] bpf: fix check for attempt to corrupt spilled pointer Andrii Nakryiko
` (3 subsequent siblings)
6 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31 5:03 UTC (permalink / raw)
To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team
Given verifier checks actual value, r0 has to be precise, so we need to
propagate precision properly.
Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
kernel/bpf/verifier.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index fbb779583d52..098ba0e1a6ff 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9739,6 +9739,12 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
verbose(env, "R0 not a scalar value\n");
return -EACCES;
}
+
+ /* we are going to enforce precise value, mark r0 precise */
+ err = mark_chain_precision(env, BPF_REG_0);
+ if (err)
+ return err;
+
if (!tnum_in(range, r0->var_off)) {
verbose_invalid_scalar(env, r0, &range, "callback return", "R0");
return -EINVAL;
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH bpf-next 4/7] bpf: fix check for attempt to corrupt spilled pointer
2023-10-31 5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
` (2 preceding siblings ...)
2023-10-31 5:03 ` [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return Andrii Nakryiko
@ 2023-10-31 5:03 ` Andrii Nakryiko
2023-11-09 15:20 ` Eduard Zingerman
2023-10-31 5:03 ` [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills Andrii Nakryiko
` (2 subsequent siblings)
6 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31 5:03 UTC (permalink / raw)
To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team
When register is spilled onto a stack as a 1/2/4-byte register, we set
slot_type[BPF_REG_SIZE - 1] (plus potentially few more below it,
depending on actual spill size). So to check if some stack slot has
spilled register we need to consult slot_type[7], not slot_type[0].
To avoid the need to remember and double-check this in the future, just
use is_spilled_reg() helper.
Fixes: 638f5b90d460 ("bpf: reduce verifier memory consumption")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
kernel/bpf/verifier.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 098ba0e1a6ff..82992c32c1bd 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4622,7 +4622,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
* so it's aligned access and [off, off + size) are within stack limits
*/
if (!env->allow_ptr_leaks &&
- state->stack[spi].slot_type[0] == STACK_SPILL &&
+ is_spilled_reg(&state->stack[spi]) &&
size != BPF_REG_SIZE) {
verbose(env, "attempt to corrupt spilled pointer on stack\n");
return -EACCES;
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills
2023-10-31 5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
` (3 preceding siblings ...)
2023-10-31 5:03 ` [PATCH bpf-next 4/7] bpf: fix check for attempt to corrupt spilled pointer Andrii Nakryiko
@ 2023-10-31 5:03 ` Andrii Nakryiko
2023-11-09 15:20 ` Eduard Zingerman
2023-10-31 5:03 ` [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore Andrii Nakryiko
2023-10-31 5:03 ` [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Andrii Nakryiko
6 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31 5:03 UTC (permalink / raw)
To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team
Instead of always forcing STACK_ZERO slots to STACK_MISC, preserve it in
situations where this is possible. E.g., when spilling register as
1/2/4-byte subslots on the stack, all the remaining bytes in the stack
slot do not automatically become unknown. If we knew they contained
zeroes, we can preserve those STACK_ZERO markers.
Add a helper mark_stack_slot_misc(), similar to scrub_spilled_slot(),
but that doesn't overwrite either STACK_INVALID nor STACK_ZERO. Note
that we need to take into account possibility of being in unprivileged
mode, in which case STACK_INVALID is forced to STACK_MISC for correctness,
as treating STACK_INVALID as equivalent STACK_MISC is only enabled in
privileged mode.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
kernel/bpf/verifier.c | 28 +++++++++++++++++++++++-----
1 file changed, 23 insertions(+), 5 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 82992c32c1bd..0eecc6b3109c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1355,6 +1355,21 @@ static void scrub_spilled_slot(u8 *stype)
*stype = STACK_MISC;
}
+/* Mark stack slot as STACK_MISC, unless it is already STACK_INVALID, in which
+ * case they are equivalent, or it's STACK_ZERO, in which case we preserve
+ * more precise STACK_ZERO.
+ * Note, in uprivileged mode leaving STACK_INVALID is wrong, so we take
+ * env->allow_ptr_leaks into account and force STACK_MISC, if necessary.
+ */
+static void mark_stack_slot_misc(struct bpf_verifier_env *env, u8 *stype)
+{
+ if (*stype == STACK_ZERO)
+ return;
+ if (env->allow_ptr_leaks && *stype == STACK_INVALID)
+ return;
+ *stype = STACK_MISC;
+}
+
static void print_scalar_ranges(struct bpf_verifier_env *env,
const struct bpf_reg_state *reg,
const char **sep)
@@ -4577,7 +4592,8 @@ static void copy_register_state(struct bpf_reg_state *dst, const struct bpf_reg_
dst->live = live;
}
-static void save_register_state(struct bpf_func_state *state,
+static void save_register_state(struct bpf_verifier_env *env,
+ struct bpf_func_state *state,
int spi, struct bpf_reg_state *reg,
int size)
{
@@ -4592,7 +4608,7 @@ static void save_register_state(struct bpf_func_state *state,
/* size < 8 bytes spill */
for (; i; i--)
- scrub_spilled_slot(&state->stack[spi].slot_type[i - 1]);
+ mark_stack_slot_misc(env, &state->stack[spi].slot_type[i - 1]);
}
static bool is_bpf_st_mem(struct bpf_insn *insn)
@@ -4654,7 +4670,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
mark_stack_slot_scratched(env, spi);
if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
!register_is_null(reg) && env->bpf_capable) {
- save_register_state(state, spi, reg, size);
+ save_register_state(env, state, spi, reg, size);
/* Break the relation on a narrowing spill. */
if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
state->stack[spi].spilled_ptr.id = 0;
@@ -4664,7 +4680,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
__mark_reg_known(&fake_reg, (u32)insn->imm);
fake_reg.type = SCALAR_VALUE;
- save_register_state(state, spi, &fake_reg, size);
+ save_register_state(env, state, spi, &fake_reg, size);
insn_flags = 0; /* not a register spill */
} else if (reg && is_spillable_regtype(reg->type)) {
/* register containing pointer is being spilled into stack */
@@ -4677,7 +4693,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
verbose(env, "cannot spill pointers to stack into stack frame of the caller\n");
return -EINVAL;
}
- save_register_state(state, spi, reg, size);
+ save_register_state(env, state, spi, reg, size);
} else {
u8 type = STACK_MISC;
@@ -4948,6 +4964,8 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
continue;
if (type == STACK_MISC)
continue;
+ if (type == STACK_ZERO)
+ continue;
if (type == STACK_INVALID && env->allow_uninit_stack)
continue;
verbose(env, "invalid read from stack off %d+%d size %d\n",
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore
2023-10-31 5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
` (4 preceding siblings ...)
2023-10-31 5:03 ` [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills Andrii Nakryiko
@ 2023-10-31 5:03 ` Andrii Nakryiko
2023-11-09 15:20 ` Eduard Zingerman
2023-10-31 5:03 ` [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Andrii Nakryiko
6 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31 5:03 UTC (permalink / raw)
To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team
Similar to special handling of STACK_ZERO, when reading 1/2/4 bytes from
stack from slot that has register spilled into it and that register has
a constant value zero, preserve that zero and mark spilled register as
precise for that. This makes spilled const zero register and STACK_ZERO
cases equivalent in their behavior.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
kernel/bpf/verifier.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0eecc6b3109c..8cfe060e4938 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4958,22 +4958,39 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
copy_register_state(&state->regs[dst_regno], reg);
state->regs[dst_regno].subreg_def = subreg_def;
} else {
+ int spill_cnt = 0, zero_cnt = 0;
+
for (i = 0; i < size; i++) {
type = stype[(slot - i) % BPF_REG_SIZE];
- if (type == STACK_SPILL)
+ if (type == STACK_SPILL) {
+ spill_cnt++;
continue;
+ }
if (type == STACK_MISC)
continue;
- if (type == STACK_ZERO)
+ if (type == STACK_ZERO) {
+ zero_cnt++;
continue;
+ }
if (type == STACK_INVALID && env->allow_uninit_stack)
continue;
verbose(env, "invalid read from stack off %d+%d size %d\n",
off, i, size);
return -EACCES;
}
- mark_reg_unknown(env, state->regs, dst_regno);
- insn_flags = 0; /* not restoring original register state */
+
+ if (spill_cnt == size &&
+ tnum_is_const(reg->var_off) && reg->var_off.value == 0) {
+ __mark_reg_const_zero(&state->regs[dst_regno]);
+ /* this IS register fill, so keep insn_flags */
+ } else if (zero_cnt == size) {
+ /* similarly to mark_reg_stack_read(), preserve zeroes */
+ __mark_reg_const_zero(&state->regs[dst_regno]);
+ insn_flags = 0; /* not restoring original register state */
+ } else {
+ mark_reg_unknown(env, state->regs, dst_regno);
+ insn_flags = 0; /* not restoring original register state */
+ }
}
state->regs[dst_regno].live |= REG_LIVE_WRITTEN;
} else if (dst_regno >= 0) {
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
2023-10-31 5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
` (5 preceding siblings ...)
2023-10-31 5:03 ` [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore Andrii Nakryiko
@ 2023-10-31 5:03 ` Andrii Nakryiko
2023-10-31 5:22 ` Andrii Nakryiko
2023-11-09 15:21 ` Eduard Zingerman
6 siblings, 2 replies; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31 5:03 UTC (permalink / raw)
To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team
Now that precision backtracing is supporting register spill/fill to/from
stack, there is another oportunity to be exploited here: minimizing
precise STACK_ZERO cases. With a simple code change we can rely on
initially imprecise register spill tracking for cases when register
spilled to stack was a known zero.
This is a very common case for initializing on the stack variables,
including rather large structures. Often times zero has no special
meaning for the subsequent BPF program logic and is often overwritten
with non-zero values soon afterwards. But due to STACK_ZERO vs
STACK_MISC tracking, such initial zero initialization actually causes
duplication of verifier states as STACK_ZERO is clearly different than
STACK_MISC or spilled SCALAR_VALUE register.
The effect of this (now) trivial change is huge, as can be seen below.
These are differences between BPF selftests, Cilium, and Meta-internal
BPF object files relative to previous patch in this series. You can see
improvements ranging from single-digit percentage improvement for
instructions and states, all the way to 50-60% reduction for some of
Meta-internal host agent programs, and even some Cilium programs.
For Meta-internal ones I left only the differences for largest BPF
object files by states/instructions, as there were too many differences
in the overall output. All the differences were improvements, reducting
number of states and thus instructions validated.
Note, Meta-internal BPF object file names are not printed below.
Many copies of balancer_ingress are actually many different
configurations of Katran, so they are different BPF programs, which
explains state reduction going from -16% all the way to 31%, depending
on BPF program logic complexity.
SELFTESTS
=========
File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
--------------------------------------- ----------------------- --------- --------- --------------- ---------- ---------- -------------
bpf_iter_netlink.bpf.linked3.o dump_netlink 148 104 -44 (-29.73%) 8 5 -3 (-37.50%)
bpf_iter_unix.bpf.linked3.o dump_unix 8474 8404 -70 (-0.83%) 151 147 -4 (-2.65%)
bpf_loop.bpf.linked3.o stack_check 560 324 -236 (-42.14%) 42 24 -18 (-42.86%)
local_storage_bench.bpf.linked3.o get_local 120 77 -43 (-35.83%) 9 6 -3 (-33.33%)
loop6.bpf.linked3.o trace_virtqueue_add_sgs 10167 9868 -299 (-2.94%) 226 206 -20 (-8.85%)
pyperf600_bpf_loop.bpf.linked3.o on_event 4872 3423 -1449 (-29.74%) 322 229 -93 (-28.88%)
strobemeta.bpf.linked3.o on_event 180697 176036 -4661 (-2.58%) 4780 4734 -46 (-0.96%)
test_cls_redirect.bpf.linked3.o cls_redirect 65594 65401 -193 (-0.29%) 4230 4212 -18 (-0.43%)
test_global_func_args.bpf.linked3.o test_cls 145 136 -9 (-6.21%) 10 9 -1 (-10.00%)
test_l4lb.bpf.linked3.o balancer_ingress 4760 2612 -2148 (-45.13%) 113 102 -11 (-9.73%)
test_l4lb_noinline.bpf.linked3.o balancer_ingress 4845 4877 +32 (+0.66%) 219 221 +2 (+0.91%)
test_l4lb_noinline_dynptr.bpf.linked3.o balancer_ingress 2072 2087 +15 (+0.72%) 97 98 +1 (+1.03%)
test_seg6_loop.bpf.linked3.o __add_egr_x 12440 9975 -2465 (-19.82%) 364 353 -11 (-3.02%)
test_tcp_hdr_options.bpf.linked3.o estab 2558 2572 +14 (+0.55%) 179 180 +1 (+0.56%)
test_xdp_dynptr.bpf.linked3.o _xdp_tx_iptunnel 645 596 -49 (-7.60%) 26 24 -2 (-7.69%)
test_xdp_noinline.bpf.linked3.o balancer_ingress_v6 3520 3516 -4 (-0.11%) 216 216 +0 (+0.00%)
xdp_synproxy_kern.bpf.linked3.o syncookie_tc 82661 81241 -1420 (-1.72%) 5073 5155 +82 (+1.62%)
xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 84964 82297 -2667 (-3.14%) 5130 5157 +27 (+0.53%)
META-INTERNAL
=============
Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
-------------------------------------- --------- --------- ----------------- ---------- ---------- ---------------
balancer_ingress 27925 23608 -4317 (-15.46%) 1488 1482 -6 (-0.40%)
balancer_ingress 31824 27546 -4278 (-13.44%) 1658 1652 -6 (-0.36%)
balancer_ingress 32213 27935 -4278 (-13.28%) 1689 1683 -6 (-0.36%)
balancer_ingress 32213 27935 -4278 (-13.28%) 1689 1683 -6 (-0.36%)
balancer_ingress 31824 27546 -4278 (-13.44%) 1658 1652 -6 (-0.36%)
balancer_ingress 38647 29562 -9085 (-23.51%) 2069 1835 -234 (-11.31%)
balancer_ingress 38647 29562 -9085 (-23.51%) 2069 1835 -234 (-11.31%)
balancer_ingress 40339 30792 -9547 (-23.67%) 2193 1934 -259 (-11.81%)
balancer_ingress 37321 29055 -8266 (-22.15%) 1972 1795 -177 (-8.98%)
balancer_ingress 38176 29753 -8423 (-22.06%) 2008 1831 -177 (-8.81%)
balancer_ingress 29193 20910 -8283 (-28.37%) 1599 1422 -177 (-11.07%)
balancer_ingress 30013 21452 -8561 (-28.52%) 1645 1447 -198 (-12.04%)
balancer_ingress 28691 24290 -4401 (-15.34%) 1545 1531 -14 (-0.91%)
balancer_ingress 34223 28965 -5258 (-15.36%) 1984 1875 -109 (-5.49%)
balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
balancer_ingress 35868 26455 -9413 (-26.24%) 2140 1827 -313 (-14.63%)
balancer_ingress 35868 26455 -9413 (-26.24%) 2140 1827 -313 (-14.63%)
balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
balancer_ingress 34844 29485 -5359 (-15.38%) 2036 1918 -118 (-5.80%)
fbflow_egress 3256 2652 -604 (-18.55%) 218 192 -26 (-11.93%)
fbflow_ingress 1026 944 -82 (-7.99%) 70 63 -7 (-10.00%)
sslwall_tc_egress 8424 7360 -1064 (-12.63%) 498 458 -40 (-8.03%)
syar_accept_protect 15040 9539 -5501 (-36.58%) 364 220 -144 (-39.56%)
syar_connect_tcp_v6 15036 9535 -5501 (-36.59%) 360 216 -144 (-40.00%)
syar_connect_udp_v4 15039 9538 -5501 (-36.58%) 361 217 -144 (-39.89%)
syar_connect_connect4_protect4 24805 15833 -8972 (-36.17%) 756 480 -276 (-36.51%)
syar_lsm_file_open 167772 151813 -15959 (-9.51%) 1836 1667 -169 (-9.20%)
syar_namespace_create_new 14805 9304 -5501 (-37.16%) 353 209 -144 (-40.79%)
syar_python3_detect 17531 12030 -5501 (-31.38%) 391 247 -144 (-36.83%)
syar_ssh_post_fork 16412 10911 -5501 (-33.52%) 405 261 -144 (-35.56%)
syar_enter_execve 14728 9227 -5501 (-37.35%) 345 201 -144 (-41.74%)
syar_enter_execveat 14728 9227 -5501 (-37.35%) 345 201 -144 (-41.74%)
syar_exit_execve 16622 11121 -5501 (-33.09%) 376 232 -144 (-38.30%)
syar_exit_execveat 16622 11121 -5501 (-33.09%) 376 232 -144 (-38.30%)
syar_syscalls_kill 15288 9787 -5501 (-35.98%) 398 254 -144 (-36.18%)
syar_task_enter_pivot_root 14898 9397 -5501 (-36.92%) 357 213 -144 (-40.34%)
syar_syscalls_setreuid 16678 11177 -5501 (-32.98%) 429 285 -144 (-33.57%)
syar_syscalls_setuid 16678 11177 -5501 (-32.98%) 429 285 -144 (-33.57%)
syar_syscalls_process_vm_readv 14959 9458 -5501 (-36.77%) 364 220 -144 (-39.56%)
syar_syscalls_process_vm_writev 15757 10256 -5501 (-34.91%) 390 246 -144 (-36.92%)
do_uprobe 15519 10018 -5501 (-35.45%) 373 229 -144 (-38.61%)
edgewall 179715 55783 -123932 (-68.96%) 12607 3999 -8608 (-68.28%)
bictcp_state 7570 4131 -3439 (-45.43%) 496 269 -227 (-45.77%)
cubictcp_state 7570 4131 -3439 (-45.43%) 496 269 -227 (-45.77%)
tcp_rate_skb_delivered 447 272 -175 (-39.15%) 29 18 -11 (-37.93%)
kprobe__bbr_set_state 4566 2615 -1951 (-42.73%) 209 124 -85 (-40.67%)
kprobe__bictcp_state 4566 2615 -1951 (-42.73%) 209 124 -85 (-40.67%)
inet_sock_set_state 1501 1337 -164 (-10.93%) 93 85 -8 (-8.60%)
tcp_retransmit_skb 1145 981 -164 (-14.32%) 67 59 -8 (-11.94%)
tcp_retransmit_synack 1183 951 -232 (-19.61%) 67 55 -12 (-17.91%)
bpf_tcptuner 1459 1187 -272 (-18.64%) 99 80 -19 (-19.19%)
tw_egress 801 776 -25 (-3.12%) 69 66 -3 (-4.35%)
tw_ingress 795 770 -25 (-3.14%) 69 66 -3 (-4.35%)
ttls_tc_ingress 19025 19383 +358 (+1.88%) 470 465 -5 (-1.06%)
ttls_nat_egress 490 299 -191 (-38.98%) 33 20 -13 (-39.39%)
ttls_nat_ingress 448 285 -163 (-36.38%) 32 21 -11 (-34.38%)
tw_twfw_egress 511127 212071 -299056 (-58.51%) 16733 8504 -8229 (-49.18%)
tw_twfw_ingress 500095 212069 -288026 (-57.59%) 16223 8504 -7719 (-47.58%)
tw_twfw_tc_eg 511113 212064 -299049 (-58.51%) 16732 8504 -8228 (-49.18%)
tw_twfw_tc_in 500095 212069 -288026 (-57.59%) 16223 8504 -7719 (-47.58%)
tw_twfw_egress 12632 12435 -197 (-1.56%) 276 260 -16 (-5.80%)
tw_twfw_ingress 12631 12454 -177 (-1.40%) 278 261 -17 (-6.12%)
tw_twfw_tc_eg 12595 12435 -160 (-1.27%) 274 259 -15 (-5.47%)
tw_twfw_tc_in 12631 12454 -177 (-1.40%) 278 261 -17 (-6.12%)
tw_xdp_dump 266 209 -57 (-21.43%) 9 8 -1 (-11.11%)
CILIUM
=========
File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
------------- -------------------------------- --------- --------- ---------------- ---------- ---------- --------------
bpf_host.o cil_to_netdev 6047 4578 -1469 (-24.29%) 362 249 -113 (-31.22%)
bpf_host.o handle_lxc_traffic 2227 1585 -642 (-28.83%) 156 103 -53 (-33.97%)
bpf_host.o tail_handle_ipv4_from_netdev 2244 1458 -786 (-35.03%) 163 106 -57 (-34.97%)
bpf_host.o tail_handle_nat_fwd_ipv4 21022 10479 -10543 (-50.15%) 1289 670 -619 (-48.02%)
bpf_host.o tail_handle_nat_fwd_ipv6 15433 11375 -4058 (-26.29%) 905 643 -262 (-28.95%)
bpf_host.o tail_ipv4_host_policy_ingress 2219 1367 -852 (-38.40%) 161 96 -65 (-40.37%)
bpf_host.o tail_nodeport_nat_egress_ipv4 22460 19862 -2598 (-11.57%) 1469 1293 -176 (-11.98%)
bpf_host.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
bpf_host.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
bpf_host.o tail_nodeport_nat_ipv6_egress 3702 3542 -160 (-4.32%) 215 205 -10 (-4.65%)
bpf_lxc.o tail_handle_nat_fwd_ipv4 21022 10479 -10543 (-50.15%) 1289 670 -619 (-48.02%)
bpf_lxc.o tail_handle_nat_fwd_ipv6 15433 11375 -4058 (-26.29%) 905 643 -262 (-28.95%)
bpf_lxc.o tail_ipv4_ct_egress 5073 3374 -1699 (-33.49%) 262 172 -90 (-34.35%)
bpf_lxc.o tail_ipv4_ct_ingress 5093 3385 -1708 (-33.54%) 262 172 -90 (-34.35%)
bpf_lxc.o tail_ipv4_ct_ingress_policy_only 5093 3385 -1708 (-33.54%) 262 172 -90 (-34.35%)
bpf_lxc.o tail_ipv6_ct_egress 4593 3878 -715 (-15.57%) 194 151 -43 (-22.16%)
bpf_lxc.o tail_ipv6_ct_ingress 4606 3891 -715 (-15.52%) 194 151 -43 (-22.16%)
bpf_lxc.o tail_ipv6_ct_ingress_policy_only 4606 3891 -715 (-15.52%) 194 151 -43 (-22.16%)
bpf_lxc.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
bpf_lxc.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
bpf_overlay.o tail_handle_nat_fwd_ipv4 20524 10114 -10410 (-50.72%) 1271 638 -633 (-49.80%)
bpf_overlay.o tail_nodeport_nat_egress_ipv4 22718 19490 -3228 (-14.21%) 1475 1275 -200 (-13.56%)
bpf_overlay.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
bpf_overlay.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
bpf_overlay.o tail_nodeport_nat_ipv6_egress 3638 3548 -90 (-2.47%) 209 203 -6 (-2.87%)
bpf_overlay.o tail_rev_nodeport_lb4 4368 3820 -548 (-12.55%) 248 215 -33 (-13.31%)
bpf_overlay.o tail_rev_nodeport_lb6 2867 2428 -439 (-15.31%) 167 140 -27 (-16.17%)
bpf_sock.o cil_sock6_connect 1718 1703 -15 (-0.87%) 100 99 -1 (-1.00%)
bpf_xdp.o tail_handle_nat_fwd_ipv4 12917 12443 -474 (-3.67%) 875 849 -26 (-2.97%)
bpf_xdp.o tail_handle_nat_fwd_ipv6 13515 13264 -251 (-1.86%) 715 702 -13 (-1.82%)
bpf_xdp.o tail_lb_ipv4 39492 36367 -3125 (-7.91%) 2430 2251 -179 (-7.37%)
bpf_xdp.o tail_lb_ipv6 80441 78058 -2383 (-2.96%) 3647 3523 -124 (-3.40%)
bpf_xdp.o tail_nodeport_ipv6_dsr 1038 901 -137 (-13.20%) 61 55 -6 (-9.84%)
bpf_xdp.o tail_nodeport_nat_egress_ipv4 13027 12096 -931 (-7.15%) 868 809 -59 (-6.80%)
bpf_xdp.o tail_nodeport_nat_ingress_ipv4 7617 5900 -1717 (-22.54%) 522 413 -109 (-20.88%)
bpf_xdp.o tail_nodeport_nat_ingress_ipv6 7575 7395 -180 (-2.38%) 383 374 -9 (-2.35%)
bpf_xdp.o tail_rev_nodeport_lb4 6808 6739 -69 (-1.01%) 403 396 -7 (-1.74%)
bpf_xdp.o tail_rev_nodeport_lb6 16173 15847 -326 (-2.02%) 1010 990 -20 (-1.98%)
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
kernel/bpf/verifier.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8cfe060e4938..e42ce974b106 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4668,8 +4668,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
return err;
mark_stack_slot_scratched(env, spi);
- if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
- !register_is_null(reg) && env->bpf_capable) {
+ if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && env->bpf_capable) {
save_register_state(env, state, spi, reg, size);
/* Break the relation on a narrowing spill. */
if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
@@ -4718,7 +4717,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
/* when we zero initialize stack slots mark them as such */
if ((reg && register_is_null(reg)) ||
(!reg && is_bpf_st_mem(insn) && insn->imm == 0)) {
- /* backtracking doesn't work for STACK_ZERO yet. */
+ /* STACK_ZERO case happened because register spill
+ * wasn't properly aligned at the stack slot boundary,
+ * so it's not a register spill anymore; force
+ * originating register to be precise to make
+ * STACK_ZERO correct for subsequent states
+ */
err = mark_chain_precision(env, value_regno);
if (err)
return err;
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
2023-10-31 5:03 ` [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Andrii Nakryiko
@ 2023-10-31 5:22 ` Andrii Nakryiko
2023-11-01 7:56 ` Jiri Olsa
2023-11-09 15:21 ` Eduard Zingerman
1 sibling, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-10-31 5:22 UTC (permalink / raw)
To: Andrii Nakryiko; +Cc: bpf, ast, daniel, martin.lau, kernel-team
On Mon, Oct 30, 2023 at 10:03 PM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> Now that precision backtracing is supporting register spill/fill to/from
> stack, there is another oportunity to be exploited here: minimizing
> precise STACK_ZERO cases. With a simple code change we can rely on
> initially imprecise register spill tracking for cases when register
> spilled to stack was a known zero.
>
> This is a very common case for initializing on the stack variables,
> including rather large structures. Often times zero has no special
> meaning for the subsequent BPF program logic and is often overwritten
> with non-zero values soon afterwards. But due to STACK_ZERO vs
> STACK_MISC tracking, such initial zero initialization actually causes
> duplication of verifier states as STACK_ZERO is clearly different than
> STACK_MISC or spilled SCALAR_VALUE register.
>
> The effect of this (now) trivial change is huge, as can be seen below.
> These are differences between BPF selftests, Cilium, and Meta-internal
> BPF object files relative to previous patch in this series. You can see
> improvements ranging from single-digit percentage improvement for
> instructions and states, all the way to 50-60% reduction for some of
> Meta-internal host agent programs, and even some Cilium programs.
>
> For Meta-internal ones I left only the differences for largest BPF
> object files by states/instructions, as there were too many differences
> in the overall output. All the differences were improvements, reducting
> number of states and thus instructions validated.
>
> Note, Meta-internal BPF object file names are not printed below.
> Many copies of balancer_ingress are actually many different
> configurations of Katran, so they are different BPF programs, which
> explains state reduction going from -16% all the way to 31%, depending
> on BPF program logic complexity.
>
> SELFTESTS
> =========
> File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
> --------------------------------------- ----------------------- --------- --------- --------------- ---------- ---------- -------------
> bpf_iter_netlink.bpf.linked3.o dump_netlink 148 104 -44 (-29.73%) 8 5 -3 (-37.50%)
> bpf_iter_unix.bpf.linked3.o dump_unix 8474 8404 -70 (-0.83%) 151 147 -4 (-2.65%)
> bpf_loop.bpf.linked3.o stack_check 560 324 -236 (-42.14%) 42 24 -18 (-42.86%)
> local_storage_bench.bpf.linked3.o get_local 120 77 -43 (-35.83%) 9 6 -3 (-33.33%)
> loop6.bpf.linked3.o trace_virtqueue_add_sgs 10167 9868 -299 (-2.94%) 226 206 -20 (-8.85%)
> pyperf600_bpf_loop.bpf.linked3.o on_event 4872 3423 -1449 (-29.74%) 322 229 -93 (-28.88%)
> strobemeta.bpf.linked3.o on_event 180697 176036 -4661 (-2.58%) 4780 4734 -46 (-0.96%)
> test_cls_redirect.bpf.linked3.o cls_redirect 65594 65401 -193 (-0.29%) 4230 4212 -18 (-0.43%)
> test_global_func_args.bpf.linked3.o test_cls 145 136 -9 (-6.21%) 10 9 -1 (-10.00%)
> test_l4lb.bpf.linked3.o balancer_ingress 4760 2612 -2148 (-45.13%) 113 102 -11 (-9.73%)
> test_l4lb_noinline.bpf.linked3.o balancer_ingress 4845 4877 +32 (+0.66%) 219 221 +2 (+0.91%)
> test_l4lb_noinline_dynptr.bpf.linked3.o balancer_ingress 2072 2087 +15 (+0.72%) 97 98 +1 (+1.03%)
> test_seg6_loop.bpf.linked3.o __add_egr_x 12440 9975 -2465 (-19.82%) 364 353 -11 (-3.02%)
> test_tcp_hdr_options.bpf.linked3.o estab 2558 2572 +14 (+0.55%) 179 180 +1 (+0.56%)
> test_xdp_dynptr.bpf.linked3.o _xdp_tx_iptunnel 645 596 -49 (-7.60%) 26 24 -2 (-7.69%)
> test_xdp_noinline.bpf.linked3.o balancer_ingress_v6 3520 3516 -4 (-0.11%) 216 216 +0 (+0.00%)
> xdp_synproxy_kern.bpf.linked3.o syncookie_tc 82661 81241 -1420 (-1.72%) 5073 5155 +82 (+1.62%)
> xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 84964 82297 -2667 (-3.14%) 5130 5157 +27 (+0.53%)
>
> META-INTERNAL
> =============
> Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
> -------------------------------------- --------- --------- ----------------- ---------- ---------- ---------------
> balancer_ingress 27925 23608 -4317 (-15.46%) 1488 1482 -6 (-0.40%)
> balancer_ingress 31824 27546 -4278 (-13.44%) 1658 1652 -6 (-0.36%)
> balancer_ingress 32213 27935 -4278 (-13.28%) 1689 1683 -6 (-0.36%)
> balancer_ingress 32213 27935 -4278 (-13.28%) 1689 1683 -6 (-0.36%)
> balancer_ingress 31824 27546 -4278 (-13.44%) 1658 1652 -6 (-0.36%)
> balancer_ingress 38647 29562 -9085 (-23.51%) 2069 1835 -234 (-11.31%)
> balancer_ingress 38647 29562 -9085 (-23.51%) 2069 1835 -234 (-11.31%)
> balancer_ingress 40339 30792 -9547 (-23.67%) 2193 1934 -259 (-11.81%)
> balancer_ingress 37321 29055 -8266 (-22.15%) 1972 1795 -177 (-8.98%)
> balancer_ingress 38176 29753 -8423 (-22.06%) 2008 1831 -177 (-8.81%)
> balancer_ingress 29193 20910 -8283 (-28.37%) 1599 1422 -177 (-11.07%)
> balancer_ingress 30013 21452 -8561 (-28.52%) 1645 1447 -198 (-12.04%)
> balancer_ingress 28691 24290 -4401 (-15.34%) 1545 1531 -14 (-0.91%)
> balancer_ingress 34223 28965 -5258 (-15.36%) 1984 1875 -109 (-5.49%)
> balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> balancer_ingress 35868 26455 -9413 (-26.24%) 2140 1827 -313 (-14.63%)
> balancer_ingress 35868 26455 -9413 (-26.24%) 2140 1827 -313 (-14.63%)
> balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> balancer_ingress 34844 29485 -5359 (-15.38%) 2036 1918 -118 (-5.80%)
> fbflow_egress 3256 2652 -604 (-18.55%) 218 192 -26 (-11.93%)
> fbflow_ingress 1026 944 -82 (-7.99%) 70 63 -7 (-10.00%)
> sslwall_tc_egress 8424 7360 -1064 (-12.63%) 498 458 -40 (-8.03%)
> syar_accept_protect 15040 9539 -5501 (-36.58%) 364 220 -144 (-39.56%)
> syar_connect_tcp_v6 15036 9535 -5501 (-36.59%) 360 216 -144 (-40.00%)
> syar_connect_udp_v4 15039 9538 -5501 (-36.58%) 361 217 -144 (-39.89%)
> syar_connect_connect4_protect4 24805 15833 -8972 (-36.17%) 756 480 -276 (-36.51%)
> syar_lsm_file_open 167772 151813 -15959 (-9.51%) 1836 1667 -169 (-9.20%)
> syar_namespace_create_new 14805 9304 -5501 (-37.16%) 353 209 -144 (-40.79%)
> syar_python3_detect 17531 12030 -5501 (-31.38%) 391 247 -144 (-36.83%)
> syar_ssh_post_fork 16412 10911 -5501 (-33.52%) 405 261 -144 (-35.56%)
> syar_enter_execve 14728 9227 -5501 (-37.35%) 345 201 -144 (-41.74%)
> syar_enter_execveat 14728 9227 -5501 (-37.35%) 345 201 -144 (-41.74%)
> syar_exit_execve 16622 11121 -5501 (-33.09%) 376 232 -144 (-38.30%)
> syar_exit_execveat 16622 11121 -5501 (-33.09%) 376 232 -144 (-38.30%)
> syar_syscalls_kill 15288 9787 -5501 (-35.98%) 398 254 -144 (-36.18%)
> syar_task_enter_pivot_root 14898 9397 -5501 (-36.92%) 357 213 -144 (-40.34%)
> syar_syscalls_setreuid 16678 11177 -5501 (-32.98%) 429 285 -144 (-33.57%)
> syar_syscalls_setuid 16678 11177 -5501 (-32.98%) 429 285 -144 (-33.57%)
> syar_syscalls_process_vm_readv 14959 9458 -5501 (-36.77%) 364 220 -144 (-39.56%)
> syar_syscalls_process_vm_writev 15757 10256 -5501 (-34.91%) 390 246 -144 (-36.92%)
> do_uprobe 15519 10018 -5501 (-35.45%) 373 229 -144 (-38.61%)
> edgewall 179715 55783 -123932 (-68.96%) 12607 3999 -8608 (-68.28%)
> bictcp_state 7570 4131 -3439 (-45.43%) 496 269 -227 (-45.77%)
> cubictcp_state 7570 4131 -3439 (-45.43%) 496 269 -227 (-45.77%)
> tcp_rate_skb_delivered 447 272 -175 (-39.15%) 29 18 -11 (-37.93%)
> kprobe__bbr_set_state 4566 2615 -1951 (-42.73%) 209 124 -85 (-40.67%)
> kprobe__bictcp_state 4566 2615 -1951 (-42.73%) 209 124 -85 (-40.67%)
> inet_sock_set_state 1501 1337 -164 (-10.93%) 93 85 -8 (-8.60%)
> tcp_retransmit_skb 1145 981 -164 (-14.32%) 67 59 -8 (-11.94%)
> tcp_retransmit_synack 1183 951 -232 (-19.61%) 67 55 -12 (-17.91%)
> bpf_tcptuner 1459 1187 -272 (-18.64%) 99 80 -19 (-19.19%)
> tw_egress 801 776 -25 (-3.12%) 69 66 -3 (-4.35%)
> tw_ingress 795 770 -25 (-3.14%) 69 66 -3 (-4.35%)
> ttls_tc_ingress 19025 19383 +358 (+1.88%) 470 465 -5 (-1.06%)
> ttls_nat_egress 490 299 -191 (-38.98%) 33 20 -13 (-39.39%)
> ttls_nat_ingress 448 285 -163 (-36.38%) 32 21 -11 (-34.38%)
> tw_twfw_egress 511127 212071 -299056 (-58.51%) 16733 8504 -8229 (-49.18%)
> tw_twfw_ingress 500095 212069 -288026 (-57.59%) 16223 8504 -7719 (-47.58%)
> tw_twfw_tc_eg 511113 212064 -299049 (-58.51%) 16732 8504 -8228 (-49.18%)
> tw_twfw_tc_in 500095 212069 -288026 (-57.59%) 16223 8504 -7719 (-47.58%)
> tw_twfw_egress 12632 12435 -197 (-1.56%) 276 260 -16 (-5.80%)
> tw_twfw_ingress 12631 12454 -177 (-1.40%) 278 261 -17 (-6.12%)
> tw_twfw_tc_eg 12595 12435 -160 (-1.27%) 274 259 -15 (-5.47%)
> tw_twfw_tc_in 12631 12454 -177 (-1.40%) 278 261 -17 (-6.12%)
> tw_xdp_dump 266 209 -57 (-21.43%) 9 8 -1 (-11.11%)
>
> CILIUM
> =========
> File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
> ------------- -------------------------------- --------- --------- ---------------- ---------- ---------- --------------
> bpf_host.o cil_to_netdev 6047 4578 -1469 (-24.29%) 362 249 -113 (-31.22%)
> bpf_host.o handle_lxc_traffic 2227 1585 -642 (-28.83%) 156 103 -53 (-33.97%)
> bpf_host.o tail_handle_ipv4_from_netdev 2244 1458 -786 (-35.03%) 163 106 -57 (-34.97%)
> bpf_host.o tail_handle_nat_fwd_ipv4 21022 10479 -10543 (-50.15%) 1289 670 -619 (-48.02%)
> bpf_host.o tail_handle_nat_fwd_ipv6 15433 11375 -4058 (-26.29%) 905 643 -262 (-28.95%)
> bpf_host.o tail_ipv4_host_policy_ingress 2219 1367 -852 (-38.40%) 161 96 -65 (-40.37%)
> bpf_host.o tail_nodeport_nat_egress_ipv4 22460 19862 -2598 (-11.57%) 1469 1293 -176 (-11.98%)
> bpf_host.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
> bpf_host.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
> bpf_host.o tail_nodeport_nat_ipv6_egress 3702 3542 -160 (-4.32%) 215 205 -10 (-4.65%)
> bpf_lxc.o tail_handle_nat_fwd_ipv4 21022 10479 -10543 (-50.15%) 1289 670 -619 (-48.02%)
> bpf_lxc.o tail_handle_nat_fwd_ipv6 15433 11375 -4058 (-26.29%) 905 643 -262 (-28.95%)
> bpf_lxc.o tail_ipv4_ct_egress 5073 3374 -1699 (-33.49%) 262 172 -90 (-34.35%)
> bpf_lxc.o tail_ipv4_ct_ingress 5093 3385 -1708 (-33.54%) 262 172 -90 (-34.35%)
> bpf_lxc.o tail_ipv4_ct_ingress_policy_only 5093 3385 -1708 (-33.54%) 262 172 -90 (-34.35%)
> bpf_lxc.o tail_ipv6_ct_egress 4593 3878 -715 (-15.57%) 194 151 -43 (-22.16%)
> bpf_lxc.o tail_ipv6_ct_ingress 4606 3891 -715 (-15.52%) 194 151 -43 (-22.16%)
> bpf_lxc.o tail_ipv6_ct_ingress_policy_only 4606 3891 -715 (-15.52%) 194 151 -43 (-22.16%)
> bpf_lxc.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
> bpf_lxc.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
> bpf_overlay.o tail_handle_nat_fwd_ipv4 20524 10114 -10410 (-50.72%) 1271 638 -633 (-49.80%)
> bpf_overlay.o tail_nodeport_nat_egress_ipv4 22718 19490 -3228 (-14.21%) 1475 1275 -200 (-13.56%)
> bpf_overlay.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
> bpf_overlay.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
> bpf_overlay.o tail_nodeport_nat_ipv6_egress 3638 3548 -90 (-2.47%) 209 203 -6 (-2.87%)
> bpf_overlay.o tail_rev_nodeport_lb4 4368 3820 -548 (-12.55%) 248 215 -33 (-13.31%)
> bpf_overlay.o tail_rev_nodeport_lb6 2867 2428 -439 (-15.31%) 167 140 -27 (-16.17%)
> bpf_sock.o cil_sock6_connect 1718 1703 -15 (-0.87%) 100 99 -1 (-1.00%)
> bpf_xdp.o tail_handle_nat_fwd_ipv4 12917 12443 -474 (-3.67%) 875 849 -26 (-2.97%)
> bpf_xdp.o tail_handle_nat_fwd_ipv6 13515 13264 -251 (-1.86%) 715 702 -13 (-1.82%)
> bpf_xdp.o tail_lb_ipv4 39492 36367 -3125 (-7.91%) 2430 2251 -179 (-7.37%)
> bpf_xdp.o tail_lb_ipv6 80441 78058 -2383 (-2.96%) 3647 3523 -124 (-3.40%)
> bpf_xdp.o tail_nodeport_ipv6_dsr 1038 901 -137 (-13.20%) 61 55 -6 (-9.84%)
> bpf_xdp.o tail_nodeport_nat_egress_ipv4 13027 12096 -931 (-7.15%) 868 809 -59 (-6.80%)
> bpf_xdp.o tail_nodeport_nat_ingress_ipv4 7617 5900 -1717 (-22.54%) 522 413 -109 (-20.88%)
> bpf_xdp.o tail_nodeport_nat_ingress_ipv6 7575 7395 -180 (-2.38%) 383 374 -9 (-2.35%)
> bpf_xdp.o tail_rev_nodeport_lb4 6808 6739 -69 (-1.01%) 403 396 -7 (-1.74%)
> bpf_xdp.o tail_rev_nodeport_lb6 16173 15847 -326 (-2.02%) 1010 990 -20 (-1.98%)
>
So I also want to mention that while I did spot check a few programs
(not the biggest ones) and they did seem to have correct verification
flow, I obviously can't easily validate verifier log_level=2 logs for
all of the changes above, especially those multi-thousand state
programs. I'd really appreciate someone from Isovalent/Cilium to do
some checking of the Cilium program or two for sanity, just in case.
Thanks!
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
> kernel/bpf/verifier.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 8cfe060e4938..e42ce974b106 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4668,8 +4668,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> return err;
>
> mark_stack_slot_scratched(env, spi);
> - if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
> - !register_is_null(reg) && env->bpf_capable) {
> + if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && env->bpf_capable) {
> save_register_state(env, state, spi, reg, size);
> /* Break the relation on a narrowing spill. */
> if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
> @@ -4718,7 +4717,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> /* when we zero initialize stack slots mark them as such */
> if ((reg && register_is_null(reg)) ||
> (!reg && is_bpf_st_mem(insn) && insn->imm == 0)) {
> - /* backtracking doesn't work for STACK_ZERO yet. */
> + /* STACK_ZERO case happened because register spill
> + * wasn't properly aligned at the stack slot boundary,
> + * so it's not a register spill anymore; force
> + * originating register to be precise to make
> + * STACK_ZERO correct for subsequent states
> + */
> err = mark_chain_precision(env, value_regno);
> if (err)
> return err;
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
2023-10-31 5:22 ` Andrii Nakryiko
@ 2023-11-01 7:56 ` Jiri Olsa
2023-11-01 16:27 ` Andrii Nakryiko
0 siblings, 1 reply; 45+ messages in thread
From: Jiri Olsa @ 2023-11-01 7:56 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team
On Mon, Oct 30, 2023 at 10:22:48PM -0700, Andrii Nakryiko wrote:
> On Mon, Oct 30, 2023 at 10:03 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Now that precision backtracing is supporting register spill/fill to/from
> > stack, there is another oportunity to be exploited here: minimizing
> > precise STACK_ZERO cases. With a simple code change we can rely on
> > initially imprecise register spill tracking for cases when register
> > spilled to stack was a known zero.
> >
> > This is a very common case for initializing on the stack variables,
> > including rather large structures. Often times zero has no special
> > meaning for the subsequent BPF program logic and is often overwritten
> > with non-zero values soon afterwards. But due to STACK_ZERO vs
> > STACK_MISC tracking, such initial zero initialization actually causes
> > duplication of verifier states as STACK_ZERO is clearly different than
> > STACK_MISC or spilled SCALAR_VALUE register.
> >
> > The effect of this (now) trivial change is huge, as can be seen below.
> > These are differences between BPF selftests, Cilium, and Meta-internal
> > BPF object files relative to previous patch in this series. You can see
> > improvements ranging from single-digit percentage improvement for
> > instructions and states, all the way to 50-60% reduction for some of
> > Meta-internal host agent programs, and even some Cilium programs.
> >
> > For Meta-internal ones I left only the differences for largest BPF
> > object files by states/instructions, as there were too many differences
> > in the overall output. All the differences were improvements, reducting
> > number of states and thus instructions validated.
> >
> > Note, Meta-internal BPF object file names are not printed below.
> > Many copies of balancer_ingress are actually many different
> > configurations of Katran, so they are different BPF programs, which
> > explains state reduction going from -16% all the way to 31%, depending
> > on BPF program logic complexity.
> >
> > SELFTESTS
> > =========
> > File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
> > --------------------------------------- ----------------------- --------- --------- --------------- ---------- ---------- -------------
> > bpf_iter_netlink.bpf.linked3.o dump_netlink 148 104 -44 (-29.73%) 8 5 -3 (-37.50%)
> > bpf_iter_unix.bpf.linked3.o dump_unix 8474 8404 -70 (-0.83%) 151 147 -4 (-2.65%)
> > bpf_loop.bpf.linked3.o stack_check 560 324 -236 (-42.14%) 42 24 -18 (-42.86%)
> > local_storage_bench.bpf.linked3.o get_local 120 77 -43 (-35.83%) 9 6 -3 (-33.33%)
> > loop6.bpf.linked3.o trace_virtqueue_add_sgs 10167 9868 -299 (-2.94%) 226 206 -20 (-8.85%)
> > pyperf600_bpf_loop.bpf.linked3.o on_event 4872 3423 -1449 (-29.74%) 322 229 -93 (-28.88%)
> > strobemeta.bpf.linked3.o on_event 180697 176036 -4661 (-2.58%) 4780 4734 -46 (-0.96%)
> > test_cls_redirect.bpf.linked3.o cls_redirect 65594 65401 -193 (-0.29%) 4230 4212 -18 (-0.43%)
> > test_global_func_args.bpf.linked3.o test_cls 145 136 -9 (-6.21%) 10 9 -1 (-10.00%)
> > test_l4lb.bpf.linked3.o balancer_ingress 4760 2612 -2148 (-45.13%) 113 102 -11 (-9.73%)
> > test_l4lb_noinline.bpf.linked3.o balancer_ingress 4845 4877 +32 (+0.66%) 219 221 +2 (+0.91%)
> > test_l4lb_noinline_dynptr.bpf.linked3.o balancer_ingress 2072 2087 +15 (+0.72%) 97 98 +1 (+1.03%)
> > test_seg6_loop.bpf.linked3.o __add_egr_x 12440 9975 -2465 (-19.82%) 364 353 -11 (-3.02%)
> > test_tcp_hdr_options.bpf.linked3.o estab 2558 2572 +14 (+0.55%) 179 180 +1 (+0.56%)
> > test_xdp_dynptr.bpf.linked3.o _xdp_tx_iptunnel 645 596 -49 (-7.60%) 26 24 -2 (-7.69%)
> > test_xdp_noinline.bpf.linked3.o balancer_ingress_v6 3520 3516 -4 (-0.11%) 216 216 +0 (+0.00%)
> > xdp_synproxy_kern.bpf.linked3.o syncookie_tc 82661 81241 -1420 (-1.72%) 5073 5155 +82 (+1.62%)
> > xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 84964 82297 -2667 (-3.14%) 5130 5157 +27 (+0.53%)
> >
> > META-INTERNAL
> > =============
> > Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
> > -------------------------------------- --------- --------- ----------------- ---------- ---------- ---------------
> > balancer_ingress 27925 23608 -4317 (-15.46%) 1488 1482 -6 (-0.40%)
> > balancer_ingress 31824 27546 -4278 (-13.44%) 1658 1652 -6 (-0.36%)
> > balancer_ingress 32213 27935 -4278 (-13.28%) 1689 1683 -6 (-0.36%)
> > balancer_ingress 32213 27935 -4278 (-13.28%) 1689 1683 -6 (-0.36%)
> > balancer_ingress 31824 27546 -4278 (-13.44%) 1658 1652 -6 (-0.36%)
> > balancer_ingress 38647 29562 -9085 (-23.51%) 2069 1835 -234 (-11.31%)
> > balancer_ingress 38647 29562 -9085 (-23.51%) 2069 1835 -234 (-11.31%)
> > balancer_ingress 40339 30792 -9547 (-23.67%) 2193 1934 -259 (-11.81%)
> > balancer_ingress 37321 29055 -8266 (-22.15%) 1972 1795 -177 (-8.98%)
> > balancer_ingress 38176 29753 -8423 (-22.06%) 2008 1831 -177 (-8.81%)
> > balancer_ingress 29193 20910 -8283 (-28.37%) 1599 1422 -177 (-11.07%)
> > balancer_ingress 30013 21452 -8561 (-28.52%) 1645 1447 -198 (-12.04%)
> > balancer_ingress 28691 24290 -4401 (-15.34%) 1545 1531 -14 (-0.91%)
> > balancer_ingress 34223 28965 -5258 (-15.36%) 1984 1875 -109 (-5.49%)
> > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> > balancer_ingress 35868 26455 -9413 (-26.24%) 2140 1827 -313 (-14.63%)
> > balancer_ingress 35868 26455 -9413 (-26.24%) 2140 1827 -313 (-14.63%)
> > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> > balancer_ingress 34844 29485 -5359 (-15.38%) 2036 1918 -118 (-5.80%)
> > fbflow_egress 3256 2652 -604 (-18.55%) 218 192 -26 (-11.93%)
> > fbflow_ingress 1026 944 -82 (-7.99%) 70 63 -7 (-10.00%)
> > sslwall_tc_egress 8424 7360 -1064 (-12.63%) 498 458 -40 (-8.03%)
> > syar_accept_protect 15040 9539 -5501 (-36.58%) 364 220 -144 (-39.56%)
> > syar_connect_tcp_v6 15036 9535 -5501 (-36.59%) 360 216 -144 (-40.00%)
> > syar_connect_udp_v4 15039 9538 -5501 (-36.58%) 361 217 -144 (-39.89%)
> > syar_connect_connect4_protect4 24805 15833 -8972 (-36.17%) 756 480 -276 (-36.51%)
> > syar_lsm_file_open 167772 151813 -15959 (-9.51%) 1836 1667 -169 (-9.20%)
> > syar_namespace_create_new 14805 9304 -5501 (-37.16%) 353 209 -144 (-40.79%)
> > syar_python3_detect 17531 12030 -5501 (-31.38%) 391 247 -144 (-36.83%)
> > syar_ssh_post_fork 16412 10911 -5501 (-33.52%) 405 261 -144 (-35.56%)
> > syar_enter_execve 14728 9227 -5501 (-37.35%) 345 201 -144 (-41.74%)
> > syar_enter_execveat 14728 9227 -5501 (-37.35%) 345 201 -144 (-41.74%)
> > syar_exit_execve 16622 11121 -5501 (-33.09%) 376 232 -144 (-38.30%)
> > syar_exit_execveat 16622 11121 -5501 (-33.09%) 376 232 -144 (-38.30%)
> > syar_syscalls_kill 15288 9787 -5501 (-35.98%) 398 254 -144 (-36.18%)
> > syar_task_enter_pivot_root 14898 9397 -5501 (-36.92%) 357 213 -144 (-40.34%)
> > syar_syscalls_setreuid 16678 11177 -5501 (-32.98%) 429 285 -144 (-33.57%)
> > syar_syscalls_setuid 16678 11177 -5501 (-32.98%) 429 285 -144 (-33.57%)
> > syar_syscalls_process_vm_readv 14959 9458 -5501 (-36.77%) 364 220 -144 (-39.56%)
> > syar_syscalls_process_vm_writev 15757 10256 -5501 (-34.91%) 390 246 -144 (-36.92%)
> > do_uprobe 15519 10018 -5501 (-35.45%) 373 229 -144 (-38.61%)
> > edgewall 179715 55783 -123932 (-68.96%) 12607 3999 -8608 (-68.28%)
> > bictcp_state 7570 4131 -3439 (-45.43%) 496 269 -227 (-45.77%)
> > cubictcp_state 7570 4131 -3439 (-45.43%) 496 269 -227 (-45.77%)
> > tcp_rate_skb_delivered 447 272 -175 (-39.15%) 29 18 -11 (-37.93%)
> > kprobe__bbr_set_state 4566 2615 -1951 (-42.73%) 209 124 -85 (-40.67%)
> > kprobe__bictcp_state 4566 2615 -1951 (-42.73%) 209 124 -85 (-40.67%)
> > inet_sock_set_state 1501 1337 -164 (-10.93%) 93 85 -8 (-8.60%)
> > tcp_retransmit_skb 1145 981 -164 (-14.32%) 67 59 -8 (-11.94%)
> > tcp_retransmit_synack 1183 951 -232 (-19.61%) 67 55 -12 (-17.91%)
> > bpf_tcptuner 1459 1187 -272 (-18.64%) 99 80 -19 (-19.19%)
> > tw_egress 801 776 -25 (-3.12%) 69 66 -3 (-4.35%)
> > tw_ingress 795 770 -25 (-3.14%) 69 66 -3 (-4.35%)
> > ttls_tc_ingress 19025 19383 +358 (+1.88%) 470 465 -5 (-1.06%)
> > ttls_nat_egress 490 299 -191 (-38.98%) 33 20 -13 (-39.39%)
> > ttls_nat_ingress 448 285 -163 (-36.38%) 32 21 -11 (-34.38%)
> > tw_twfw_egress 511127 212071 -299056 (-58.51%) 16733 8504 -8229 (-49.18%)
> > tw_twfw_ingress 500095 212069 -288026 (-57.59%) 16223 8504 -7719 (-47.58%)
> > tw_twfw_tc_eg 511113 212064 -299049 (-58.51%) 16732 8504 -8228 (-49.18%)
> > tw_twfw_tc_in 500095 212069 -288026 (-57.59%) 16223 8504 -7719 (-47.58%)
> > tw_twfw_egress 12632 12435 -197 (-1.56%) 276 260 -16 (-5.80%)
> > tw_twfw_ingress 12631 12454 -177 (-1.40%) 278 261 -17 (-6.12%)
> > tw_twfw_tc_eg 12595 12435 -160 (-1.27%) 274 259 -15 (-5.47%)
> > tw_twfw_tc_in 12631 12454 -177 (-1.40%) 278 261 -17 (-6.12%)
> > tw_xdp_dump 266 209 -57 (-21.43%) 9 8 -1 (-11.11%)
> >
> > CILIUM
> > =========
> > File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
> > ------------- -------------------------------- --------- --------- ---------------- ---------- ---------- --------------
> > bpf_host.o cil_to_netdev 6047 4578 -1469 (-24.29%) 362 249 -113 (-31.22%)
> > bpf_host.o handle_lxc_traffic 2227 1585 -642 (-28.83%) 156 103 -53 (-33.97%)
> > bpf_host.o tail_handle_ipv4_from_netdev 2244 1458 -786 (-35.03%) 163 106 -57 (-34.97%)
> > bpf_host.o tail_handle_nat_fwd_ipv4 21022 10479 -10543 (-50.15%) 1289 670 -619 (-48.02%)
> > bpf_host.o tail_handle_nat_fwd_ipv6 15433 11375 -4058 (-26.29%) 905 643 -262 (-28.95%)
> > bpf_host.o tail_ipv4_host_policy_ingress 2219 1367 -852 (-38.40%) 161 96 -65 (-40.37%)
> > bpf_host.o tail_nodeport_nat_egress_ipv4 22460 19862 -2598 (-11.57%) 1469 1293 -176 (-11.98%)
> > bpf_host.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
> > bpf_host.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
> > bpf_host.o tail_nodeport_nat_ipv6_egress 3702 3542 -160 (-4.32%) 215 205 -10 (-4.65%)
> > bpf_lxc.o tail_handle_nat_fwd_ipv4 21022 10479 -10543 (-50.15%) 1289 670 -619 (-48.02%)
> > bpf_lxc.o tail_handle_nat_fwd_ipv6 15433 11375 -4058 (-26.29%) 905 643 -262 (-28.95%)
> > bpf_lxc.o tail_ipv4_ct_egress 5073 3374 -1699 (-33.49%) 262 172 -90 (-34.35%)
> > bpf_lxc.o tail_ipv4_ct_ingress 5093 3385 -1708 (-33.54%) 262 172 -90 (-34.35%)
> > bpf_lxc.o tail_ipv4_ct_ingress_policy_only 5093 3385 -1708 (-33.54%) 262 172 -90 (-34.35%)
> > bpf_lxc.o tail_ipv6_ct_egress 4593 3878 -715 (-15.57%) 194 151 -43 (-22.16%)
> > bpf_lxc.o tail_ipv6_ct_ingress 4606 3891 -715 (-15.52%) 194 151 -43 (-22.16%)
> > bpf_lxc.o tail_ipv6_ct_ingress_policy_only 4606 3891 -715 (-15.52%) 194 151 -43 (-22.16%)
> > bpf_lxc.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
> > bpf_lxc.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
> > bpf_overlay.o tail_handle_nat_fwd_ipv4 20524 10114 -10410 (-50.72%) 1271 638 -633 (-49.80%)
> > bpf_overlay.o tail_nodeport_nat_egress_ipv4 22718 19490 -3228 (-14.21%) 1475 1275 -200 (-13.56%)
> > bpf_overlay.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
> > bpf_overlay.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
> > bpf_overlay.o tail_nodeport_nat_ipv6_egress 3638 3548 -90 (-2.47%) 209 203 -6 (-2.87%)
> > bpf_overlay.o tail_rev_nodeport_lb4 4368 3820 -548 (-12.55%) 248 215 -33 (-13.31%)
> > bpf_overlay.o tail_rev_nodeport_lb6 2867 2428 -439 (-15.31%) 167 140 -27 (-16.17%)
> > bpf_sock.o cil_sock6_connect 1718 1703 -15 (-0.87%) 100 99 -1 (-1.00%)
> > bpf_xdp.o tail_handle_nat_fwd_ipv4 12917 12443 -474 (-3.67%) 875 849 -26 (-2.97%)
> > bpf_xdp.o tail_handle_nat_fwd_ipv6 13515 13264 -251 (-1.86%) 715 702 -13 (-1.82%)
> > bpf_xdp.o tail_lb_ipv4 39492 36367 -3125 (-7.91%) 2430 2251 -179 (-7.37%)
> > bpf_xdp.o tail_lb_ipv6 80441 78058 -2383 (-2.96%) 3647 3523 -124 (-3.40%)
> > bpf_xdp.o tail_nodeport_ipv6_dsr 1038 901 -137 (-13.20%) 61 55 -6 (-9.84%)
> > bpf_xdp.o tail_nodeport_nat_egress_ipv4 13027 12096 -931 (-7.15%) 868 809 -59 (-6.80%)
> > bpf_xdp.o tail_nodeport_nat_ingress_ipv4 7617 5900 -1717 (-22.54%) 522 413 -109 (-20.88%)
> > bpf_xdp.o tail_nodeport_nat_ingress_ipv6 7575 7395 -180 (-2.38%) 383 374 -9 (-2.35%)
> > bpf_xdp.o tail_rev_nodeport_lb4 6808 6739 -69 (-1.01%) 403 396 -7 (-1.74%)
> > bpf_xdp.o tail_rev_nodeport_lb6 16173 15847 -326 (-2.02%) 1010 990 -20 (-1.98%)
> >
>
> So I also want to mention that while I did spot check a few programs
> (not the biggest ones) and they did seem to have correct verification
> flow, I obviously can't easily validate verifier log_level=2 logs for
> all of the changes above, especially those multi-thousand state
> programs. I'd really appreciate someone from Isovalent/Cilium to do
> some checking of the Cilium program or two for sanity, just in case.
> Thanks!
fyi, I was curious so tried that on top of tetragon programs,
seems up and down, but verification time is mostly lower ;-)
jirka
---
$ veristat --compare veristat.old veristat.new
File Program Duration (us) (A) Duration (us) (B) Duration (us) (DIFF) Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) Peak states (A) Peak states (B) Peak states (DIFF)
------------------------------ ----------------------------- ----------------- ----------------- -------------------- --------- --------- ---------------- ---------- ---------- --------------- --------------- --------------- ------------------
bpf_cgroup_mkdir.o tg_tp_cgrp_mkdir 206 190 -16 (-7.77%) 581 581 +0 (+0.00%) 24 24 +0 (+0.00%) 24 24 +0 (+0.00%)
bpf_cgroup_release.o tg_tp_cgrp_release 114 104 -10 (-8.77%) 381 381 +0 (+0.00%) 13 13 +0 (+0.00%) 13 13 +0 (+0.00%)
bpf_cgroup_rmdir.o tg_tp_cgrp_rmdir 126 121 -5 (-3.97%) 381 381 +0 (+0.00%) 13 13 +0 (+0.00%) 13 13 +0 (+0.00%)
bpf_execve_bprm_commit_creds.o tg_kp_bprm_committing_creds 100 95 -5 (-5.00%) 163 163 +0 (+0.00%) 14 14 +0 (+0.00%) 14 14 +0 (+0.00%)
bpf_execve_event.o event_execve 12147 12843 +696 (+5.73%) 35096 34723 -373 (-1.06%) 2278 2251 -27 (-1.19%) 1110 1115 +5 (+0.45%)
bpf_execve_event.o execve_send 93 57 -36 (-38.71%) 82 82 +0 (+0.00%) 6 6 +0 (+0.00%) 6 6 +0 (+0.00%)
bpf_execve_event_v53.o event_execve 97457 98430 +973 (+1.00%) 245365 239363 -6002 (-2.45%) 15430 15334 -96 (-0.62%) 7994 7929 -65 (-0.81%)
bpf_execve_event_v53.o execve_send 52 54 +2 (+3.85%) 105 105 +0 (+0.00%) 5 5 +0 (+0.00%) 5 5 +0 (+0.00%)
bpf_execve_event_v61.o event_execve 6094 6059 -35 (-0.57%) 27456 26871 -585 (-2.13%) 671 636 -35 (-5.22%) 301 309 +8 (+2.66%)
bpf_execve_event_v61.o execve_send 66 69 +3 (+4.55%) 105 105 +0 (+0.00%) 5 5 +0 (+0.00%) 5 5 +0 (+0.00%)
bpf_exit.o event_exit 65 53 -12 (-18.46%) 94 94 +0 (+0.00%) 8 8 +0 (+0.00%) 8 8 +0 (+0.00%)
bpf_fork.o event_wake_up_new_task 179 209 +30 (+16.76%) 514 514 +0 (+0.00%) 30 30 +0 (+0.00%) 30 30 +0 (+0.00%)
bpf_generic_kprobe.o generic_fmodret_override 67 70 +3 (+4.48%) 18 18 +0 (+0.00%) 2 2 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_actions 2386 1893 -493 (-20.66%) 6746 6746 +0 (+0.00%) 287 287 +0 (+0.00%) 207 207 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_event 302 306 +4 (+1.32%) 580 580 +0 (+0.00%) 47 47 +0 (+0.00%) 47 47 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_filter_arg1 2679 2464 -215 (-8.03%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_filter_arg2 2487 2777 +290 (+11.66%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_filter_arg3 2905 2620 -285 (-9.81%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_filter_arg4 2834 2706 -128 (-4.52%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_filter_arg5 2771 2621 -150 (-5.41%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_output 44 41 -3 (-6.82%) 29 29 +0 (+0.00%) 2 2 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_override 40 39 -1 (-2.50%) 20 20 +0 (+0.00%) 2 2 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_process_event0 7817 7945 +128 (+1.64%) 21321 21001 -320 (-1.50%) 1440 1403 -37 (-2.57%) 906 889 -17 (-1.88%)
bpf_generic_kprobe.o generic_kprobe_process_event1 7239 7468 +229 (+3.16%) 19782 19681 -101 (-0.51%) 1348 1339 -9 (-0.67%) 888 884 -4 (-0.45%)
bpf_generic_kprobe.o generic_kprobe_process_event2 7415 7691 +276 (+3.72%) 19782 19680 -102 (-0.52%) 1348 1339 -9 (-0.67%) 888 884 -4 (-0.45%)
bpf_generic_kprobe.o generic_kprobe_process_event3 7581 7024 -557 (-7.35%) 19779 19680 -99 (-0.50%) 1348 1338 -10 (-0.74%) 888 883 -5 (-0.56%)
bpf_generic_kprobe.o generic_kprobe_process_event4 8016 7572 -444 (-5.54%) 19760 19658 -102 (-0.52%) 1355 1344 -11 (-0.81%) 891 885 -6 (-0.67%)
bpf_generic_kprobe.o generic_kprobe_process_filter 43093 31779 -11314 (-26.25%) 77948 66684 -11264 (-14.45%) 6048 5009 -1039 (-17.18%) 1678 1640 -38 (-2.26%)
bpf_generic_kprobe_v53.o generic_fmodret_override 64 66 +2 (+3.12%) 18 18 +0 (+0.00%) 2 2 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_kprobe_v53.o generic_kprobe_actions 23258 14115 -9143 (-39.31%) 42545 42545 +0 (+0.00%) 1434 1434 +0 (+0.00%) 378 378 +0 (+0.00%)
bpf_generic_kprobe_v53.o generic_kprobe_event 298 303 +5 (+1.68%) 583 583 +0 (+0.00%) 47 47 +0 (+0.00%) 47 47 +0 (+0.00%)
bpf_generic_kprobe_v53.o generic_kprobe_filter_arg1 25215 26076 +861 (+3.41%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_kprobe_v53.o generic_kprobe_filter_arg2 24813 24288 -525 (-2.12%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_kprobe_v53.o generic_kprobe_filter_arg3 26494 24362 -2132 (-8.05%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_kprobe_v53.o generic_kprobe_filter_arg4 24373 24041 -332 (-1.36%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_kprobe_v53.o generic_kprobe_filter_arg5 26265 24317 -1948 (-7.42%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_kprobe_v53.o generic_kprobe_output 119 148 +29 (+24.37%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_generic_kprobe_v53.o generic_kprobe_override 38 39 +1 (+2.63%) 20 20 +0 (+0.00%) 2 2 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_kprobe_v53.o generic_kprobe_process_event0 102334 101040 -1294 (-1.26%) 283295 283172 -123 (-0.04%) 16044 16033 -11 (-0.07%) 8123 8123 +0 (+0.00%)
bpf_generic_kprobe_v53.o generic_kprobe_process_event1 108349 106105 -2244 (-2.07%) 313458 315263 +1805 (+0.58%) 16524 16544 +20 (+0.12%) 8121 8123 +2 (+0.02%)
bpf_generic_kprobe_v53.o generic_kprobe_process_event2 109991 105951 -4040 (-3.67%) 313458 315263 +1805 (+0.58%) 16524 16544 +20 (+0.12%) 8121 8123 +2 (+0.02%)
bpf_generic_kprobe_v53.o generic_kprobe_process_event3 110279 109525 -754 (-0.68%) 313455 315260 +1805 (+0.58%) 16524 16544 +20 (+0.12%) 8121 8123 +2 (+0.02%)
bpf_generic_kprobe_v53.o generic_kprobe_process_event4 106100 111486 +5386 (+5.08%) 296244 308555 +12311 (+4.16%) 16249 16386 +137 (+0.84%) 8116 8135 +19 (+0.23%)
bpf_generic_kprobe_v53.o generic_kprobe_process_filter 57465 54691 -2774 (-4.83%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%) 1525 1421 -104 (-6.82%)
bpf_generic_kprobe_v61.o generic_fmodret_override 94 89 -5 (-5.32%) 18 18 +0 (+0.00%) 2 2 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_kprobe_v61.o generic_kprobe_actions 15903 15072 -831 (-5.23%) 42545 42545 +0 (+0.00%) 1434 1434 +0 (+0.00%) 378 378 +0 (+0.00%)
bpf_generic_kprobe_v61.o generic_kprobe_event 303 340 +37 (+12.21%) 583 583 +0 (+0.00%) 47 47 +0 (+0.00%) 47 47 +0 (+0.00%)
bpf_generic_kprobe_v61.o generic_kprobe_filter_arg1 25870 24169 -1701 (-6.58%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_kprobe_v61.o generic_kprobe_filter_arg2 26667 24070 -2597 (-9.74%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_kprobe_v61.o generic_kprobe_filter_arg3 27248 24758 -2490 (-9.14%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_kprobe_v61.o generic_kprobe_filter_arg4 27483 26107 -1376 (-5.01%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_kprobe_v61.o generic_kprobe_filter_arg5 26764 26316 -448 (-1.67%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_kprobe_v61.o generic_kprobe_output 153 149 -4 (-2.61%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_generic_kprobe_v61.o generic_kprobe_override 56 51 -5 (-8.93%) 20 20 +0 (+0.00%) 2 2 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_kprobe_v61.o generic_kprobe_process_event0 11184 10303 -881 (-7.88%) 58564 49822 -8742 (-14.93%) 1243 1108 -135 (-10.86%) 547 534 -13 (-2.38%)
bpf_generic_kprobe_v61.o generic_kprobe_process_event1 12683 14576 +1893 (+14.93%) 68450 75716 +7266 (+10.62%) 1477 1566 +89 (+6.03%) 550 538 -12 (-2.18%)
bpf_generic_kprobe_v61.o generic_kprobe_process_event2 12822 14709 +1887 (+14.72%) 68450 75715 +7265 (+10.61%) 1477 1566 +89 (+6.03%) 550 538 -12 (-2.18%)
bpf_generic_kprobe_v61.o generic_kprobe_process_event3 13016 15029 +2013 (+15.47%) 68447 75715 +7268 (+10.62%) 1477 1565 +88 (+5.96%) 550 537 -13 (-2.36%)
bpf_generic_kprobe_v61.o generic_kprobe_process_event4 11141 12815 +1674 (+15.03%) 58981 74350 +15369 (+26.06%) 1292 1522 +230 (+17.80%) 552 558 +6 (+1.09%)
bpf_generic_kprobe_v61.o generic_kprobe_process_filter 57674 51652 -6022 (-10.44%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%) 1525 1421 -104 (-6.82%)
bpf_generic_retkprobe.o generic_retkprobe_event 11526 11239 -287 (-2.49%) 28282 28008 -274 (-0.97%) 1973 1949 -24 (-1.22%) 1168 1164 -4 (-0.34%)
bpf_generic_retkprobe_v53.o generic_retkprobe_event 108357 105058 -3299 (-3.04%) 231680 231505 -175 (-0.08%) 16131 16113 -18 (-0.11%) 8238 8235 -3 (-0.04%)
bpf_generic_retkprobe_v61.o generic_retkprobe_event 10694 11197 +503 (+4.70%) 24960 24775 -185 (-0.74%) 1854 1842 -12 (-0.65%) 656 648 -8 (-1.22%)
bpf_generic_tracepoint.o generic_tracepoint_actions 2259 1998 -261 (-11.55%) 6692 6692 +0 (+0.00%) 295 295 +0 (+0.00%) 224 224 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_arg1 2523 2569 +46 (+1.82%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_arg2 2853 2692 -161 (-5.64%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_arg3 2522 2902 +380 (+15.07%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_arg4 2538 2837 +299 (+11.78%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_arg5 2598 2640 +42 (+1.62%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_event 691 617 -74 (-10.71%) 1487 1487 +0 (+0.00%) 92 92 +0 (+0.00%) 92 92 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_event0 7566 8026 +460 (+6.08%) 20592 20479 -113 (-0.55%) 1421 1409 -12 (-0.84%) 870 867 -3 (-0.34%)
bpf_generic_tracepoint.o generic_tracepoint_event1 7347 9822 +2475 (+33.69%) 19782 19681 -101 (-0.51%) 1348 1339 -9 (-0.67%) 888 884 -4 (-0.45%)
bpf_generic_tracepoint.o generic_tracepoint_event2 7218 7804 +586 (+8.12%) 19782 19680 -102 (-0.52%) 1348 1339 -9 (-0.67%) 888 884 -4 (-0.45%)
bpf_generic_tracepoint.o generic_tracepoint_event3 7296 7587 +291 (+3.99%) 19779 19680 -99 (-0.50%) 1348 1338 -10 (-0.74%) 888 883 -5 (-0.56%)
bpf_generic_tracepoint.o generic_tracepoint_event4 7215 8109 +894 (+12.39%) 19760 19658 -102 (-0.52%) 1355 1344 -11 (-0.81%) 891 885 -6 (-0.67%)
bpf_generic_tracepoint.o generic_tracepoint_filter 41153 33891 -7262 (-17.65%) 77948 66684 -11264 (-14.45%) 6048 5009 -1039 (-17.18%) 1678 1640 -38 (-2.26%)
bpf_generic_tracepoint.o generic_tracepoint_output 41 36 -5 (-12.20%) 29 29 +0 (+0.00%) 2 2 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_tracepoint_v53.o generic_tracepoint_actions 15139 14536 -603 (-3.98%) 41191 41191 +0 (+0.00%) 1397 1397 +0 (+0.00%) 390 390 +0 (+0.00%)
bpf_generic_tracepoint_v53.o generic_tracepoint_arg1 26569 23775 -2794 (-10.52%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_tracepoint_v53.o generic_tracepoint_arg2 26853 24057 -2796 (-10.41%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_tracepoint_v53.o generic_tracepoint_arg3 27067 24044 -3023 (-11.17%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_tracepoint_v53.o generic_tracepoint_arg4 24410 23953 -457 (-1.87%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_tracepoint_v53.o generic_tracepoint_arg5 30439 24792 -5647 (-18.55%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_tracepoint_v53.o generic_tracepoint_event 581 591 +10 (+1.72%) 1490 1490 +0 (+0.00%) 92 92 +0 (+0.00%) 92 92 +0 (+0.00%)
bpf_generic_tracepoint_v53.o generic_tracepoint_event0 94250 96057 +1807 (+1.92%) 215685 215586 -99 (-0.05%) 14954 14938 -16 (-0.11%) 7900 7897 -3 (-0.04%)
bpf_generic_tracepoint_v53.o generic_tracepoint_event1 93947 95801 +1854 (+1.97%) 215701 215602 -99 (-0.05%) 14955 14941 -14 (-0.09%) 7904 7899 -5 (-0.06%)
bpf_generic_tracepoint_v53.o generic_tracepoint_event2 96306 95407 -899 (-0.93%) 215701 215602 -99 (-0.05%) 14955 14941 -14 (-0.09%) 7904 7899 -5 (-0.06%)
bpf_generic_tracepoint_v53.o generic_tracepoint_event3 97718 90734 -6984 (-7.15%) 215698 215599 -99 (-0.05%) 14955 14941 -14 (-0.09%) 7904 7899 -5 (-0.06%)
bpf_generic_tracepoint_v53.o generic_tracepoint_event4 97822 89913 -7909 (-8.09%) 215757 215704 -53 (-0.02%) 14951 14942 -9 (-0.06%) 7896 7897 +1 (+0.01%)
bpf_generic_tracepoint_v53.o generic_tracepoint_filter 64076 50012 -14064 (-21.95%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%) 1525 1421 -104 (-6.82%)
bpf_generic_tracepoint_v53.o generic_tracepoint_output 136 136 +0 (+0.00%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_generic_tracepoint_v61.o generic_tracepoint_actions 16298 14731 -1567 (-9.61%) 41191 41191 +0 (+0.00%) 1397 1397 +0 (+0.00%) 390 390 +0 (+0.00%)
bpf_generic_tracepoint_v61.o generic_tracepoint_arg1 27534 23721 -3813 (-13.85%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_tracepoint_v61.o generic_tracepoint_arg2 28248 24052 -4196 (-14.85%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_tracepoint_v61.o generic_tracepoint_arg3 29118 24012 -5106 (-17.54%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_tracepoint_v61.o generic_tracepoint_arg4 33309 23915 -9394 (-28.20%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_tracepoint_v61.o generic_tracepoint_arg5 28057 24983 -3074 (-10.96%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_tracepoint_v61.o generic_tracepoint_event 555 531 -24 (-4.32%) 1490 1490 +0 (+0.00%) 92 92 +0 (+0.00%) 92 92 +0 (+0.00%)
bpf_generic_tracepoint_v61.o generic_tracepoint_event0 2128 2058 -70 (-3.29%) 4403 4100 -303 (-6.88%) 326 305 -21 (-6.44%) 300 292 -8 (-2.67%)
bpf_generic_tracepoint_v61.o generic_tracepoint_event1 1982 2028 +46 (+2.32%) 4409 4106 -303 (-6.87%) 328 304 -24 (-7.32%) 303 293 -10 (-3.30%)
bpf_generic_tracepoint_v61.o generic_tracepoint_event2 2357 2054 -303 (-12.86%) 4409 4106 -303 (-6.87%) 328 304 -24 (-7.32%) 303 293 -10 (-3.30%)
bpf_generic_tracepoint_v61.o generic_tracepoint_event3 2018 1835 -183 (-9.07%) 4406 4103 -303 (-6.88%) 328 304 -24 (-7.32%) 303 293 -10 (-3.30%)
bpf_generic_tracepoint_v61.o generic_tracepoint_event4 2094 1910 -184 (-8.79%) 4396 4124 -272 (-6.19%) 323 304 -19 (-5.88%) 299 293 -6 (-2.01%)
bpf_generic_tracepoint_v61.o generic_tracepoint_filter 63620 50068 -13552 (-21.30%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%) 1525 1421 -104 (-6.82%)
bpf_generic_tracepoint_v61.o generic_tracepoint_output 120 141 +21 (+17.50%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_actions 1767 1928 +161 (+9.11%) 5702 5702 +0 (+0.00%) 248 248 +0 (+0.00%) 188 188 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_event 232 207 -25 (-10.78%) 429 429 +0 (+0.00%) 33 33 +0 (+0.00%) 33 33 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_filter_arg1 2764 2832 +68 (+2.46%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_filter_arg2 2639 2675 +36 (+1.36%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_filter_arg3 3875 2529 -1346 (-34.74%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_filter_arg4 2646 2540 -106 (-4.01%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_filter_arg5 2510 2674 +164 (+6.53%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%) 448 448 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_output 41 39 -2 (-4.88%) 29 29 +0 (+0.00%) 2 2 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_process_event0 7804 8154 +350 (+4.48%) 21063 20890 -173 (-0.82%) 1419 1400 -19 (-1.34%) 889 887 -2 (-0.22%)
bpf_generic_uprobe.o generic_uprobe_process_event1 8326 8041 -285 (-3.42%) 19782 19681 -101 (-0.51%) 1348 1339 -9 (-0.67%) 888 884 -4 (-0.45%)
bpf_generic_uprobe.o generic_uprobe_process_event2 8183 7016 -1167 (-14.26%) 19782 19680 -102 (-0.52%) 1348 1339 -9 (-0.67%) 888 884 -4 (-0.45%)
bpf_generic_uprobe.o generic_uprobe_process_event3 8127 6999 -1128 (-13.88%) 19779 19680 -99 (-0.50%) 1348 1338 -10 (-0.74%) 888 883 -5 (-0.56%)
bpf_generic_uprobe.o generic_uprobe_process_event4 8072 7185 -887 (-10.99%) 19760 19658 -102 (-0.52%) 1355 1344 -11 (-0.81%) 891 885 -6 (-0.67%)
bpf_generic_uprobe.o generic_uprobe_process_filter 40999 31572 -9427 (-22.99%) 77948 66684 -11264 (-14.45%) 6048 5009 -1039 (-17.18%) 1678 1640 -38 (-2.26%)
bpf_generic_uprobe_v53.o generic_uprobe_actions 14216 14310 +94 (+0.66%) 39443 39443 +0 (+0.00%) 1336 1336 +0 (+0.00%) 379 379 +0 (+0.00%)
bpf_generic_uprobe_v53.o generic_uprobe_event 236 223 -13 (-5.51%) 433 433 +0 (+0.00%) 33 33 +0 (+0.00%) 33 33 +0 (+0.00%)
bpf_generic_uprobe_v53.o generic_uprobe_filter_arg1 28012 26052 -1960 (-7.00%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_uprobe_v53.o generic_uprobe_filter_arg2 27759 26451 -1308 (-4.71%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_uprobe_v53.o generic_uprobe_filter_arg3 27301 25856 -1445 (-5.29%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_uprobe_v53.o generic_uprobe_filter_arg4 26331 26187 -144 (-0.55%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_uprobe_v53.o generic_uprobe_filter_arg5 27284 26122 -1162 (-4.26%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_uprobe_v53.o generic_uprobe_output 148 144 -4 (-2.70%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_generic_uprobe_v53.o generic_uprobe_process_event0 103254 90496 -12758 (-12.36%) 215852 215620 -232 (-0.11%) 14972 14952 -20 (-0.13%) 7905 7899 -6 (-0.08%)
bpf_generic_uprobe_v53.o generic_uprobe_process_event1 104517 90211 -14306 (-13.69%) 215701 215602 -99 (-0.05%) 14955 14941 -14 (-0.09%) 7904 7899 -5 (-0.06%)
bpf_generic_uprobe_v53.o generic_uprobe_process_event2 101025 90027 -10998 (-10.89%) 215701 215602 -99 (-0.05%) 14955 14941 -14 (-0.09%) 7904 7899 -5 (-0.06%)
bpf_generic_uprobe_v53.o generic_uprobe_process_event3 99776 95596 -4180 (-4.19%) 215698 215599 -99 (-0.05%) 14955 14941 -14 (-0.09%) 7904 7899 -5 (-0.06%)
bpf_generic_uprobe_v53.o generic_uprobe_process_event4 99896 96233 -3663 (-3.67%) 215757 215704 -53 (-0.02%) 14951 14942 -9 (-0.06%) 7896 7897 +1 (+0.01%)
bpf_generic_uprobe_v53.o generic_uprobe_process_filter 65621 56496 -9125 (-13.91%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%) 1525 1421 -104 (-6.82%)
bpf_generic_uprobe_v61.o generic_uprobe_actions 14050 14958 +908 (+6.46%) 39443 39443 +0 (+0.00%) 1336 1336 +0 (+0.00%) 379 379 +0 (+0.00%)
bpf_generic_uprobe_v61.o generic_uprobe_event 241 309 +68 (+28.22%) 433 433 +0 (+0.00%) 33 33 +0 (+0.00%) 33 33 +0 (+0.00%)
bpf_generic_uprobe_v61.o generic_uprobe_filter_arg1 30324 26943 -3381 (-11.15%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_uprobe_v61.o generic_uprobe_filter_arg2 26755 26758 +3 (+0.01%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_uprobe_v61.o generic_uprobe_filter_arg3 28337 27992 -345 (-1.22%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_uprobe_v61.o generic_uprobe_filter_arg4 26332 27308 +976 (+3.71%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_uprobe_v61.o generic_uprobe_filter_arg5 27209 26780 -429 (-1.58%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_generic_uprobe_v61.o generic_uprobe_output 138 146 +8 (+5.80%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_generic_uprobe_v61.o generic_uprobe_process_event0 2194 2133 -61 (-2.78%) 4395 4152 -243 (-5.53%) 329 312 -17 (-5.17%) 303 297 -6 (-1.98%)
bpf_generic_uprobe_v61.o generic_uprobe_process_event1 1885 1832 -53 (-2.81%) 4409 4106 -303 (-6.87%) 328 304 -24 (-7.32%) 303 293 -10 (-3.30%)
bpf_generic_uprobe_v61.o generic_uprobe_process_event2 2775 1966 -809 (-29.15%) 4409 4106 -303 (-6.87%) 328 304 -24 (-7.32%) 303 293 -10 (-3.30%)
bpf_generic_uprobe_v61.o generic_uprobe_process_event3 3237 2004 -1233 (-38.09%) 4406 4103 -303 (-6.88%) 328 304 -24 (-7.32%) 303 293 -10 (-3.30%)
bpf_generic_uprobe_v61.o generic_uprobe_process_event4 1950 2031 +81 (+4.15%) 4396 4124 -272 (-6.19%) 323 304 -19 (-5.88%) 299 293 -6 (-2.01%)
bpf_generic_uprobe_v61.o generic_uprobe_process_filter 62774 56727 -6047 (-9.63%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%) 1525 1421 -104 (-6.82%)
bpf_globals.o read_globals_test 0 0 +0 (+0.00%) 0 0 +0 (+0.00%) 0 0 +0 (+0.00%) 0 0 +0 (+0.00%)
bpf_killer.o killer 27 28 +1 (+3.70%) 33 33 +0 (+0.00%) 3 3 +0 (+0.00%) 3 3 +0 (+0.00%)
bpf_loader.o loader_kprobe 84 82 -2 (-2.38%) 144 144 +0 (+0.00%) 10 10 +0 (+0.00%) 10 10 +0 (+0.00%)
bpf_lseek.o test_lseek 54 41 -13 (-24.07%) 67 67 +0 (+0.00%) 2 2 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_multi_killer.o killer 22 22 +0 (+0.00%) 33 33 +0 (+0.00%) 3 3 +0 (+0.00%) 3 3 +0 (+0.00%)
bpf_multi_kprobe_v53.o generic_fmodret_override 108 73 -35 (-32.41%) 18 18 +0 (+0.00%) 2 2 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_multi_kprobe_v53.o generic_kprobe_actions 29346 14095 -15251 (-51.97%) 42545 42545 +0 (+0.00%) 1434 1434 +0 (+0.00%) 378 378 +0 (+0.00%)
bpf_multi_kprobe_v53.o generic_kprobe_event 339 345 +6 (+1.77%) 585 585 +0 (+0.00%) 48 48 +0 (+0.00%) 48 48 +0 (+0.00%)
bpf_multi_kprobe_v53.o generic_kprobe_filter_arg1 33490 23550 -9940 (-29.68%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_multi_kprobe_v53.o generic_kprobe_filter_arg2 42586 24318 -18268 (-42.90%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_multi_kprobe_v53.o generic_kprobe_filter_arg3 39256 24731 -14525 (-37.00%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_multi_kprobe_v53.o generic_kprobe_filter_arg4 41607 23955 -17652 (-42.43%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_multi_kprobe_v53.o generic_kprobe_filter_arg5 49382 24518 -24864 (-50.35%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_multi_kprobe_v53.o generic_kprobe_output 185 128 -57 (-30.81%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_multi_kprobe_v53.o generic_kprobe_override 62 41 -21 (-33.87%) 20 20 +0 (+0.00%) 2 2 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_multi_kprobe_v53.o generic_kprobe_process_event0 113628 100702 -12926 (-11.38%) 283295 283172 -123 (-0.04%) 16044 16033 -11 (-0.07%) 8123 8123 +0 (+0.00%)
bpf_multi_kprobe_v53.o generic_kprobe_process_event1 132058 106791 -25267 (-19.13%) 313458 315263 +1805 (+0.58%) 16524 16544 +20 (+0.12%) 8121 8123 +2 (+0.02%)
bpf_multi_kprobe_v53.o generic_kprobe_process_event2 122505 106459 -16046 (-13.10%) 313458 315263 +1805 (+0.58%) 16524 16544 +20 (+0.12%) 8121 8123 +2 (+0.02%)
bpf_multi_kprobe_v53.o generic_kprobe_process_event3 127258 106633 -20625 (-16.21%) 313455 315260 +1805 (+0.58%) 16524 16544 +20 (+0.12%) 8121 8123 +2 (+0.02%)
bpf_multi_kprobe_v53.o generic_kprobe_process_event4 121800 111903 -9897 (-8.13%) 296244 308555 +12311 (+4.16%) 16249 16386 +137 (+0.84%) 8116 8135 +19 (+0.23%)
bpf_multi_kprobe_v53.o generic_kprobe_process_filter 73918 54826 -19092 (-25.83%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%) 1525 1421 -104 (-6.82%)
bpf_multi_kprobe_v61.o generic_fmodret_override 71 91 +20 (+28.17%) 18 18 +0 (+0.00%) 2 2 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_multi_kprobe_v61.o generic_kprobe_actions 16654 15088 -1566 (-9.40%) 42545 42545 +0 (+0.00%) 1434 1434 +0 (+0.00%) 378 378 +0 (+0.00%)
bpf_multi_kprobe_v61.o generic_kprobe_event 517 278 -239 (-46.23%) 585 585 +0 (+0.00%) 48 48 +0 (+0.00%) 48 48 +0 (+0.00%)
bpf_multi_kprobe_v61.o generic_kprobe_filter_arg1 41140 26793 -14347 (-34.87%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_multi_kprobe_v61.o generic_kprobe_filter_arg2 30326 26454 -3872 (-12.77%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_multi_kprobe_v61.o generic_kprobe_filter_arg3 38517 24452 -14065 (-36.52%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_multi_kprobe_v61.o generic_kprobe_filter_arg4 36157 24539 -11618 (-32.13%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_multi_kprobe_v61.o generic_kprobe_filter_arg5 40673 25657 -15016 (-36.92%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%) 1726 1742 +16 (+0.93%)
bpf_multi_kprobe_v61.o generic_kprobe_output 153 150 -3 (-1.96%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_multi_kprobe_v61.o generic_kprobe_override 40 51 +11 (+27.50%) 20 20 +0 (+0.00%) 2 2 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_multi_kprobe_v61.o generic_kprobe_process_event0 17270 9818 -7452 (-43.15%) 58564 49822 -8742 (-14.93%) 1243 1108 -135 (-10.86%) 547 534 -13 (-2.38%)
bpf_multi_kprobe_v61.o generic_kprobe_process_event1 16763 13670 -3093 (-18.45%) 68450 75716 +7266 (+10.62%) 1477 1566 +89 (+6.03%) 550 538 -12 (-2.18%)
bpf_multi_kprobe_v61.o generic_kprobe_process_event2 14321 14000 -321 (-2.24%) 68450 75715 +7265 (+10.61%) 1477 1566 +89 (+6.03%) 550 538 -12 (-2.18%)
bpf_multi_kprobe_v61.o generic_kprobe_process_event3 14824 13829 -995 (-6.71%) 68447 75715 +7268 (+10.62%) 1477 1565 +88 (+5.96%) 550 537 -13 (-2.36%)
bpf_multi_kprobe_v61.o generic_kprobe_process_event4 14745 14029 -716 (-4.86%) 58981 74350 +15369 (+26.06%) 1292 1522 +230 (+17.80%) 552 558 +6 (+1.09%)
bpf_multi_kprobe_v61.o generic_kprobe_process_filter 73994 54979 -19015 (-25.70%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%) 1525 1421 -104 (-6.82%)
bpf_multi_retkprobe_v53.o generic_retkprobe_event 127625 110224 -17401 (-13.63%) 231631 231456 -175 (-0.08%) 16130 16112 -18 (-0.11%) 8239 8236 -3 (-0.04%)
bpf_multi_retkprobe_v61.o generic_retkprobe_event 12110 9753 -2357 (-19.46%) 24404 24110 -294 (-1.20%) 1859 1841 -18 (-0.97%) 658 647 -11 (-1.67%)
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
2023-11-01 7:56 ` Jiri Olsa
@ 2023-11-01 16:27 ` Andrii Nakryiko
2023-11-02 9:54 ` Jiri Olsa
0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-01 16:27 UTC (permalink / raw)
To: Jiri Olsa; +Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team
On Wed, Nov 1, 2023 at 12:56 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Mon, Oct 30, 2023 at 10:22:48PM -0700, Andrii Nakryiko wrote:
> > On Mon, Oct 30, 2023 at 10:03 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > Now that precision backtracing is supporting register spill/fill to/from
> > > stack, there is another oportunity to be exploited here: minimizing
> > > precise STACK_ZERO cases. With a simple code change we can rely on
> > > initially imprecise register spill tracking for cases when register
> > > spilled to stack was a known zero.
> > >
> > > This is a very common case for initializing on the stack variables,
> > > including rather large structures. Often times zero has no special
> > > meaning for the subsequent BPF program logic and is often overwritten
> > > with non-zero values soon afterwards. But due to STACK_ZERO vs
> > > STACK_MISC tracking, such initial zero initialization actually causes
> > > duplication of verifier states as STACK_ZERO is clearly different than
> > > STACK_MISC or spilled SCALAR_VALUE register.
> > >
> > > The effect of this (now) trivial change is huge, as can be seen below.
> > > These are differences between BPF selftests, Cilium, and Meta-internal
> > > BPF object files relative to previous patch in this series. You can see
> > > improvements ranging from single-digit percentage improvement for
> > > instructions and states, all the way to 50-60% reduction for some of
> > > Meta-internal host agent programs, and even some Cilium programs.
> > >
> > > For Meta-internal ones I left only the differences for largest BPF
> > > object files by states/instructions, as there were too many differences
> > > in the overall output. All the differences were improvements, reducting
> > > number of states and thus instructions validated.
> > >
> > > Note, Meta-internal BPF object file names are not printed below.
> > > Many copies of balancer_ingress are actually many different
> > > configurations of Katran, so they are different BPF programs, which
> > > explains state reduction going from -16% all the way to 31%, depending
> > > on BPF program logic complexity.
> > >
> > > SELFTESTS
> > > =========
> > > File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
> > > --------------------------------------- ----------------------- --------- --------- --------------- ---------- ---------- -------------
> > > bpf_iter_netlink.bpf.linked3.o dump_netlink 148 104 -44 (-29.73%) 8 5 -3 (-37.50%)
> > > bpf_iter_unix.bpf.linked3.o dump_unix 8474 8404 -70 (-0.83%) 151 147 -4 (-2.65%)
> > > bpf_loop.bpf.linked3.o stack_check 560 324 -236 (-42.14%) 42 24 -18 (-42.86%)
> > > local_storage_bench.bpf.linked3.o get_local 120 77 -43 (-35.83%) 9 6 -3 (-33.33%)
> > > loop6.bpf.linked3.o trace_virtqueue_add_sgs 10167 9868 -299 (-2.94%) 226 206 -20 (-8.85%)
> > > pyperf600_bpf_loop.bpf.linked3.o on_event 4872 3423 -1449 (-29.74%) 322 229 -93 (-28.88%)
> > > strobemeta.bpf.linked3.o on_event 180697 176036 -4661 (-2.58%) 4780 4734 -46 (-0.96%)
> > > test_cls_redirect.bpf.linked3.o cls_redirect 65594 65401 -193 (-0.29%) 4230 4212 -18 (-0.43%)
> > > test_global_func_args.bpf.linked3.o test_cls 145 136 -9 (-6.21%) 10 9 -1 (-10.00%)
> > > test_l4lb.bpf.linked3.o balancer_ingress 4760 2612 -2148 (-45.13%) 113 102 -11 (-9.73%)
> > > test_l4lb_noinline.bpf.linked3.o balancer_ingress 4845 4877 +32 (+0.66%) 219 221 +2 (+0.91%)
> > > test_l4lb_noinline_dynptr.bpf.linked3.o balancer_ingress 2072 2087 +15 (+0.72%) 97 98 +1 (+1.03%)
> > > test_seg6_loop.bpf.linked3.o __add_egr_x 12440 9975 -2465 (-19.82%) 364 353 -11 (-3.02%)
> > > test_tcp_hdr_options.bpf.linked3.o estab 2558 2572 +14 (+0.55%) 179 180 +1 (+0.56%)
> > > test_xdp_dynptr.bpf.linked3.o _xdp_tx_iptunnel 645 596 -49 (-7.60%) 26 24 -2 (-7.69%)
> > > test_xdp_noinline.bpf.linked3.o balancer_ingress_v6 3520 3516 -4 (-0.11%) 216 216 +0 (+0.00%)
> > > xdp_synproxy_kern.bpf.linked3.o syncookie_tc 82661 81241 -1420 (-1.72%) 5073 5155 +82 (+1.62%)
> > > xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 84964 82297 -2667 (-3.14%) 5130 5157 +27 (+0.53%)
> > >
> > > META-INTERNAL
> > > =============
> > > Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
> > > -------------------------------------- --------- --------- ----------------- ---------- ---------- ---------------
> > > balancer_ingress 27925 23608 -4317 (-15.46%) 1488 1482 -6 (-0.40%)
> > > balancer_ingress 31824 27546 -4278 (-13.44%) 1658 1652 -6 (-0.36%)
> > > balancer_ingress 32213 27935 -4278 (-13.28%) 1689 1683 -6 (-0.36%)
> > > balancer_ingress 32213 27935 -4278 (-13.28%) 1689 1683 -6 (-0.36%)
> > > balancer_ingress 31824 27546 -4278 (-13.44%) 1658 1652 -6 (-0.36%)
> > > balancer_ingress 38647 29562 -9085 (-23.51%) 2069 1835 -234 (-11.31%)
> > > balancer_ingress 38647 29562 -9085 (-23.51%) 2069 1835 -234 (-11.31%)
> > > balancer_ingress 40339 30792 -9547 (-23.67%) 2193 1934 -259 (-11.81%)
> > > balancer_ingress 37321 29055 -8266 (-22.15%) 1972 1795 -177 (-8.98%)
> > > balancer_ingress 38176 29753 -8423 (-22.06%) 2008 1831 -177 (-8.81%)
> > > balancer_ingress 29193 20910 -8283 (-28.37%) 1599 1422 -177 (-11.07%)
> > > balancer_ingress 30013 21452 -8561 (-28.52%) 1645 1447 -198 (-12.04%)
> > > balancer_ingress 28691 24290 -4401 (-15.34%) 1545 1531 -14 (-0.91%)
> > > balancer_ingress 34223 28965 -5258 (-15.36%) 1984 1875 -109 (-5.49%)
> > > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> > > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> > > balancer_ingress 35868 26455 -9413 (-26.24%) 2140 1827 -313 (-14.63%)
> > > balancer_ingress 35868 26455 -9413 (-26.24%) 2140 1827 -313 (-14.63%)
> > > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> > > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> > > balancer_ingress 34844 29485 -5359 (-15.38%) 2036 1918 -118 (-5.80%)
> > > fbflow_egress 3256 2652 -604 (-18.55%) 218 192 -26 (-11.93%)
> > > fbflow_ingress 1026 944 -82 (-7.99%) 70 63 -7 (-10.00%)
> > > sslwall_tc_egress 8424 7360 -1064 (-12.63%) 498 458 -40 (-8.03%)
> > > syar_accept_protect 15040 9539 -5501 (-36.58%) 364 220 -144 (-39.56%)
> > > syar_connect_tcp_v6 15036 9535 -5501 (-36.59%) 360 216 -144 (-40.00%)
> > > syar_connect_udp_v4 15039 9538 -5501 (-36.58%) 361 217 -144 (-39.89%)
> > > syar_connect_connect4_protect4 24805 15833 -8972 (-36.17%) 756 480 -276 (-36.51%)
> > > syar_lsm_file_open 167772 151813 -15959 (-9.51%) 1836 1667 -169 (-9.20%)
> > > syar_namespace_create_new 14805 9304 -5501 (-37.16%) 353 209 -144 (-40.79%)
> > > syar_python3_detect 17531 12030 -5501 (-31.38%) 391 247 -144 (-36.83%)
> > > syar_ssh_post_fork 16412 10911 -5501 (-33.52%) 405 261 -144 (-35.56%)
> > > syar_enter_execve 14728 9227 -5501 (-37.35%) 345 201 -144 (-41.74%)
> > > syar_enter_execveat 14728 9227 -5501 (-37.35%) 345 201 -144 (-41.74%)
> > > syar_exit_execve 16622 11121 -5501 (-33.09%) 376 232 -144 (-38.30%)
> > > syar_exit_execveat 16622 11121 -5501 (-33.09%) 376 232 -144 (-38.30%)
> > > syar_syscalls_kill 15288 9787 -5501 (-35.98%) 398 254 -144 (-36.18%)
> > > syar_task_enter_pivot_root 14898 9397 -5501 (-36.92%) 357 213 -144 (-40.34%)
> > > syar_syscalls_setreuid 16678 11177 -5501 (-32.98%) 429 285 -144 (-33.57%)
> > > syar_syscalls_setuid 16678 11177 -5501 (-32.98%) 429 285 -144 (-33.57%)
> > > syar_syscalls_process_vm_readv 14959 9458 -5501 (-36.77%) 364 220 -144 (-39.56%)
> > > syar_syscalls_process_vm_writev 15757 10256 -5501 (-34.91%) 390 246 -144 (-36.92%)
> > > do_uprobe 15519 10018 -5501 (-35.45%) 373 229 -144 (-38.61%)
> > > edgewall 179715 55783 -123932 (-68.96%) 12607 3999 -8608 (-68.28%)
> > > bictcp_state 7570 4131 -3439 (-45.43%) 496 269 -227 (-45.77%)
> > > cubictcp_state 7570 4131 -3439 (-45.43%) 496 269 -227 (-45.77%)
> > > tcp_rate_skb_delivered 447 272 -175 (-39.15%) 29 18 -11 (-37.93%)
> > > kprobe__bbr_set_state 4566 2615 -1951 (-42.73%) 209 124 -85 (-40.67%)
> > > kprobe__bictcp_state 4566 2615 -1951 (-42.73%) 209 124 -85 (-40.67%)
> > > inet_sock_set_state 1501 1337 -164 (-10.93%) 93 85 -8 (-8.60%)
> > > tcp_retransmit_skb 1145 981 -164 (-14.32%) 67 59 -8 (-11.94%)
> > > tcp_retransmit_synack 1183 951 -232 (-19.61%) 67 55 -12 (-17.91%)
> > > bpf_tcptuner 1459 1187 -272 (-18.64%) 99 80 -19 (-19.19%)
> > > tw_egress 801 776 -25 (-3.12%) 69 66 -3 (-4.35%)
> > > tw_ingress 795 770 -25 (-3.14%) 69 66 -3 (-4.35%)
> > > ttls_tc_ingress 19025 19383 +358 (+1.88%) 470 465 -5 (-1.06%)
> > > ttls_nat_egress 490 299 -191 (-38.98%) 33 20 -13 (-39.39%)
> > > ttls_nat_ingress 448 285 -163 (-36.38%) 32 21 -11 (-34.38%)
> > > tw_twfw_egress 511127 212071 -299056 (-58.51%) 16733 8504 -8229 (-49.18%)
> > > tw_twfw_ingress 500095 212069 -288026 (-57.59%) 16223 8504 -7719 (-47.58%)
> > > tw_twfw_tc_eg 511113 212064 -299049 (-58.51%) 16732 8504 -8228 (-49.18%)
> > > tw_twfw_tc_in 500095 212069 -288026 (-57.59%) 16223 8504 -7719 (-47.58%)
> > > tw_twfw_egress 12632 12435 -197 (-1.56%) 276 260 -16 (-5.80%)
> > > tw_twfw_ingress 12631 12454 -177 (-1.40%) 278 261 -17 (-6.12%)
> > > tw_twfw_tc_eg 12595 12435 -160 (-1.27%) 274 259 -15 (-5.47%)
> > > tw_twfw_tc_in 12631 12454 -177 (-1.40%) 278 261 -17 (-6.12%)
> > > tw_xdp_dump 266 209 -57 (-21.43%) 9 8 -1 (-11.11%)
> > >
> > > CILIUM
> > > =========
> > > File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
> > > ------------- -------------------------------- --------- --------- ---------------- ---------- ---------- --------------
> > > bpf_host.o cil_to_netdev 6047 4578 -1469 (-24.29%) 362 249 -113 (-31.22%)
> > > bpf_host.o handle_lxc_traffic 2227 1585 -642 (-28.83%) 156 103 -53 (-33.97%)
> > > bpf_host.o tail_handle_ipv4_from_netdev 2244 1458 -786 (-35.03%) 163 106 -57 (-34.97%)
> > > bpf_host.o tail_handle_nat_fwd_ipv4 21022 10479 -10543 (-50.15%) 1289 670 -619 (-48.02%)
> > > bpf_host.o tail_handle_nat_fwd_ipv6 15433 11375 -4058 (-26.29%) 905 643 -262 (-28.95%)
> > > bpf_host.o tail_ipv4_host_policy_ingress 2219 1367 -852 (-38.40%) 161 96 -65 (-40.37%)
> > > bpf_host.o tail_nodeport_nat_egress_ipv4 22460 19862 -2598 (-11.57%) 1469 1293 -176 (-11.98%)
> > > bpf_host.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
> > > bpf_host.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
> > > bpf_host.o tail_nodeport_nat_ipv6_egress 3702 3542 -160 (-4.32%) 215 205 -10 (-4.65%)
> > > bpf_lxc.o tail_handle_nat_fwd_ipv4 21022 10479 -10543 (-50.15%) 1289 670 -619 (-48.02%)
> > > bpf_lxc.o tail_handle_nat_fwd_ipv6 15433 11375 -4058 (-26.29%) 905 643 -262 (-28.95%)
> > > bpf_lxc.o tail_ipv4_ct_egress 5073 3374 -1699 (-33.49%) 262 172 -90 (-34.35%)
> > > bpf_lxc.o tail_ipv4_ct_ingress 5093 3385 -1708 (-33.54%) 262 172 -90 (-34.35%)
> > > bpf_lxc.o tail_ipv4_ct_ingress_policy_only 5093 3385 -1708 (-33.54%) 262 172 -90 (-34.35%)
> > > bpf_lxc.o tail_ipv6_ct_egress 4593 3878 -715 (-15.57%) 194 151 -43 (-22.16%)
> > > bpf_lxc.o tail_ipv6_ct_ingress 4606 3891 -715 (-15.52%) 194 151 -43 (-22.16%)
> > > bpf_lxc.o tail_ipv6_ct_ingress_policy_only 4606 3891 -715 (-15.52%) 194 151 -43 (-22.16%)
> > > bpf_lxc.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
> > > bpf_lxc.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
> > > bpf_overlay.o tail_handle_nat_fwd_ipv4 20524 10114 -10410 (-50.72%) 1271 638 -633 (-49.80%)
> > > bpf_overlay.o tail_nodeport_nat_egress_ipv4 22718 19490 -3228 (-14.21%) 1475 1275 -200 (-13.56%)
> > > bpf_overlay.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
> > > bpf_overlay.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
> > > bpf_overlay.o tail_nodeport_nat_ipv6_egress 3638 3548 -90 (-2.47%) 209 203 -6 (-2.87%)
> > > bpf_overlay.o tail_rev_nodeport_lb4 4368 3820 -548 (-12.55%) 248 215 -33 (-13.31%)
> > > bpf_overlay.o tail_rev_nodeport_lb6 2867 2428 -439 (-15.31%) 167 140 -27 (-16.17%)
> > > bpf_sock.o cil_sock6_connect 1718 1703 -15 (-0.87%) 100 99 -1 (-1.00%)
> > > bpf_xdp.o tail_handle_nat_fwd_ipv4 12917 12443 -474 (-3.67%) 875 849 -26 (-2.97%)
> > > bpf_xdp.o tail_handle_nat_fwd_ipv6 13515 13264 -251 (-1.86%) 715 702 -13 (-1.82%)
> > > bpf_xdp.o tail_lb_ipv4 39492 36367 -3125 (-7.91%) 2430 2251 -179 (-7.37%)
> > > bpf_xdp.o tail_lb_ipv6 80441 78058 -2383 (-2.96%) 3647 3523 -124 (-3.40%)
> > > bpf_xdp.o tail_nodeport_ipv6_dsr 1038 901 -137 (-13.20%) 61 55 -6 (-9.84%)
> > > bpf_xdp.o tail_nodeport_nat_egress_ipv4 13027 12096 -931 (-7.15%) 868 809 -59 (-6.80%)
> > > bpf_xdp.o tail_nodeport_nat_ingress_ipv4 7617 5900 -1717 (-22.54%) 522 413 -109 (-20.88%)
> > > bpf_xdp.o tail_nodeport_nat_ingress_ipv6 7575 7395 -180 (-2.38%) 383 374 -9 (-2.35%)
> > > bpf_xdp.o tail_rev_nodeport_lb4 6808 6739 -69 (-1.01%) 403 396 -7 (-1.74%)
> > > bpf_xdp.o tail_rev_nodeport_lb6 16173 15847 -326 (-2.02%) 1010 990 -20 (-1.98%)
> > >
> >
> > So I also want to mention that while I did spot check a few programs
> > (not the biggest ones) and they did seem to have correct verification
> > flow, I obviously can't easily validate verifier log_level=2 logs for
> > all of the changes above, especially those multi-thousand state
> > programs. I'd really appreciate someone from Isovalent/Cilium to do
> > some checking of the Cilium program or two for sanity, just in case.
> > Thanks!
>
> fyi, I was curious so tried that on top of tetragon programs,
> seems up and down, but verification time is mostly lower ;-)
>
Nice! Can you please regenerate results and sort by either insn_diff
(absolute difference, not percentage), or states_diff? It would be
easier to see top10 improvement and regression that way. Percentages
by themselves can be misleading.
Oh, and peak states are probably not that useful, so maybe just use
`-e file,prog,duration,insns,states -s insns_diff`?
> jirka
>
>
> ---
> $ veristat --compare veristat.old veristat.new
>
> File Program Duration (us) (A) Duration (us) (B) Duration (us) (DIFF) Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) Peak states (A) Peak states (B) Peak states (DIFF)
> ------------------------------ ----------------------------- ----------------- ----------------- -------------------- --------- --------- ---------------- ---------- ---------- --------------- --------------- --------------- ------------------
> bpf_cgroup_mkdir.o tg_tp_cgrp_mkdir 206 190 -16 (-7.77%) 581 581 +0 (+0.00%) 24 24 +0 (+0.00%) 24 24 +0 (+0.00%)
> bpf_cgroup_release.o tg_tp_cgrp_release 114 104 -10 (-8.77%) 381 381 +0 (+0.00%) 13 13 +0 (+0.00%) 13 13 +0 (+0.00%)
> bpf_cgroup_rmdir.o tg_tp_cgrp_rmdir 126 121 -5 (-3.97%) 381 381 +0 (+0.00%) 13 13 +0 (+0.00%) 13 13 +0 (+0.00%)
> bpf_execve_bprm_commit_creds.o tg_kp_bprm_committing_creds 100 95 -5 (-5.00%) 163 163 +0 (+0.00%) 14 14 +0 (+0.00%) 14 14 +0 (+0.00%)
> bpf_execve_event.o event_execve 12147 12843 +696 (+5.73%) 35096 34723 -373 (-1.06%) 2278 2251 -27 (-1.19%) 1110 1115 +5 (+0.45%)
> bpf_execve_event.o execve_send 93 57 -36 (-38.71%) 82 82 +0 (+0.00%) 6 6 +0 (+0.00%) 6 6 +0 (+0.00%)
> bpf_execve_event_v53.o event_execve 97457 98430 +973 (+1.00%) 245365 239363 -6002 (-2.45%) 15430 15334 -96 (-0.62%) 7994 7929 -65 (-0.81%)
> bpf_execve_event_v53.o execve_send 52 54 +2 (+3.85%) 105 105 +0 (+0.00%) 5 5 +0 (+0.00%) 5 5 +0 (+0.00%)
> bpf_execve_event_v61.o event_execve 6094 6059 -35 (-0.57%) 27456 26871 -585 (-2.13%) 671 636 -35 (-5.22%) 301 309 +8 (+2.66%)
> bpf_execve_event_v61.o execve_send 66 69 +3 (+4.55%) 105 105 +0 (+0.00%) 5 5 +0 (+0.00%) 5 5 +0 (+0.00%)
> bpf_exit.o event_exit 65 53 -12 (-18.46%) 94 94 +0 (+0.00%) 8 8 +0 (+0.00%) 8 8 +0 (+0.00%)
> bpf_fork.o event_wake_up_new_task 179 209 +30 (+16.76%) 514 514 +0 (+0.00%) 30 30 +0 (+0.00%) 30 30 +0 (+0.00%)
[...]
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
2023-11-01 16:27 ` Andrii Nakryiko
@ 2023-11-02 9:54 ` Jiri Olsa
0 siblings, 0 replies; 45+ messages in thread
From: Jiri Olsa @ 2023-11-02 9:54 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Jiri Olsa, Andrii Nakryiko, bpf, ast, daniel, martin.lau,
kernel-team
On Wed, Nov 01, 2023 at 09:27:21AM -0700, Andrii Nakryiko wrote:
> On Wed, Nov 1, 2023 at 12:56 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Mon, Oct 30, 2023 at 10:22:48PM -0700, Andrii Nakryiko wrote:
> > > On Mon, Oct 30, 2023 at 10:03 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > >
> > > > Now that precision backtracing is supporting register spill/fill to/from
> > > > stack, there is another oportunity to be exploited here: minimizing
> > > > precise STACK_ZERO cases. With a simple code change we can rely on
> > > > initially imprecise register spill tracking for cases when register
> > > > spilled to stack was a known zero.
> > > >
> > > > This is a very common case for initializing on the stack variables,
> > > > including rather large structures. Often times zero has no special
> > > > meaning for the subsequent BPF program logic and is often overwritten
> > > > with non-zero values soon afterwards. But due to STACK_ZERO vs
> > > > STACK_MISC tracking, such initial zero initialization actually causes
> > > > duplication of verifier states as STACK_ZERO is clearly different than
> > > > STACK_MISC or spilled SCALAR_VALUE register.
> > > >
> > > > The effect of this (now) trivial change is huge, as can be seen below.
> > > > These are differences between BPF selftests, Cilium, and Meta-internal
> > > > BPF object files relative to previous patch in this series. You can see
> > > > improvements ranging from single-digit percentage improvement for
> > > > instructions and states, all the way to 50-60% reduction for some of
> > > > Meta-internal host agent programs, and even some Cilium programs.
> > > >
> > > > For Meta-internal ones I left only the differences for largest BPF
> > > > object files by states/instructions, as there were too many differences
> > > > in the overall output. All the differences were improvements, reducting
> > > > number of states and thus instructions validated.
> > > >
> > > > Note, Meta-internal BPF object file names are not printed below.
> > > > Many copies of balancer_ingress are actually many different
> > > > configurations of Katran, so they are different BPF programs, which
> > > > explains state reduction going from -16% all the way to 31%, depending
> > > > on BPF program logic complexity.
> > > >
> > > > SELFTESTS
> > > > =========
> > > > File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
> > > > --------------------------------------- ----------------------- --------- --------- --------------- ---------- ---------- -------------
> > > > bpf_iter_netlink.bpf.linked3.o dump_netlink 148 104 -44 (-29.73%) 8 5 -3 (-37.50%)
> > > > bpf_iter_unix.bpf.linked3.o dump_unix 8474 8404 -70 (-0.83%) 151 147 -4 (-2.65%)
> > > > bpf_loop.bpf.linked3.o stack_check 560 324 -236 (-42.14%) 42 24 -18 (-42.86%)
> > > > local_storage_bench.bpf.linked3.o get_local 120 77 -43 (-35.83%) 9 6 -3 (-33.33%)
> > > > loop6.bpf.linked3.o trace_virtqueue_add_sgs 10167 9868 -299 (-2.94%) 226 206 -20 (-8.85%)
> > > > pyperf600_bpf_loop.bpf.linked3.o on_event 4872 3423 -1449 (-29.74%) 322 229 -93 (-28.88%)
> > > > strobemeta.bpf.linked3.o on_event 180697 176036 -4661 (-2.58%) 4780 4734 -46 (-0.96%)
> > > > test_cls_redirect.bpf.linked3.o cls_redirect 65594 65401 -193 (-0.29%) 4230 4212 -18 (-0.43%)
> > > > test_global_func_args.bpf.linked3.o test_cls 145 136 -9 (-6.21%) 10 9 -1 (-10.00%)
> > > > test_l4lb.bpf.linked3.o balancer_ingress 4760 2612 -2148 (-45.13%) 113 102 -11 (-9.73%)
> > > > test_l4lb_noinline.bpf.linked3.o balancer_ingress 4845 4877 +32 (+0.66%) 219 221 +2 (+0.91%)
> > > > test_l4lb_noinline_dynptr.bpf.linked3.o balancer_ingress 2072 2087 +15 (+0.72%) 97 98 +1 (+1.03%)
> > > > test_seg6_loop.bpf.linked3.o __add_egr_x 12440 9975 -2465 (-19.82%) 364 353 -11 (-3.02%)
> > > > test_tcp_hdr_options.bpf.linked3.o estab 2558 2572 +14 (+0.55%) 179 180 +1 (+0.56%)
> > > > test_xdp_dynptr.bpf.linked3.o _xdp_tx_iptunnel 645 596 -49 (-7.60%) 26 24 -2 (-7.69%)
> > > > test_xdp_noinline.bpf.linked3.o balancer_ingress_v6 3520 3516 -4 (-0.11%) 216 216 +0 (+0.00%)
> > > > xdp_synproxy_kern.bpf.linked3.o syncookie_tc 82661 81241 -1420 (-1.72%) 5073 5155 +82 (+1.62%)
> > > > xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 84964 82297 -2667 (-3.14%) 5130 5157 +27 (+0.53%)
> > > >
> > > > META-INTERNAL
> > > > =============
> > > > Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
> > > > -------------------------------------- --------- --------- ----------------- ---------- ---------- ---------------
> > > > balancer_ingress 27925 23608 -4317 (-15.46%) 1488 1482 -6 (-0.40%)
> > > > balancer_ingress 31824 27546 -4278 (-13.44%) 1658 1652 -6 (-0.36%)
> > > > balancer_ingress 32213 27935 -4278 (-13.28%) 1689 1683 -6 (-0.36%)
> > > > balancer_ingress 32213 27935 -4278 (-13.28%) 1689 1683 -6 (-0.36%)
> > > > balancer_ingress 31824 27546 -4278 (-13.44%) 1658 1652 -6 (-0.36%)
> > > > balancer_ingress 38647 29562 -9085 (-23.51%) 2069 1835 -234 (-11.31%)
> > > > balancer_ingress 38647 29562 -9085 (-23.51%) 2069 1835 -234 (-11.31%)
> > > > balancer_ingress 40339 30792 -9547 (-23.67%) 2193 1934 -259 (-11.81%)
> > > > balancer_ingress 37321 29055 -8266 (-22.15%) 1972 1795 -177 (-8.98%)
> > > > balancer_ingress 38176 29753 -8423 (-22.06%) 2008 1831 -177 (-8.81%)
> > > > balancer_ingress 29193 20910 -8283 (-28.37%) 1599 1422 -177 (-11.07%)
> > > > balancer_ingress 30013 21452 -8561 (-28.52%) 1645 1447 -198 (-12.04%)
> > > > balancer_ingress 28691 24290 -4401 (-15.34%) 1545 1531 -14 (-0.91%)
> > > > balancer_ingress 34223 28965 -5258 (-15.36%) 1984 1875 -109 (-5.49%)
> > > > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> > > > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> > > > balancer_ingress 35868 26455 -9413 (-26.24%) 2140 1827 -313 (-14.63%)
> > > > balancer_ingress 35868 26455 -9413 (-26.24%) 2140 1827 -313 (-14.63%)
> > > > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> > > > balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%)
> > > > balancer_ingress 34844 29485 -5359 (-15.38%) 2036 1918 -118 (-5.80%)
> > > > fbflow_egress 3256 2652 -604 (-18.55%) 218 192 -26 (-11.93%)
> > > > fbflow_ingress 1026 944 -82 (-7.99%) 70 63 -7 (-10.00%)
> > > > sslwall_tc_egress 8424 7360 -1064 (-12.63%) 498 458 -40 (-8.03%)
> > > > syar_accept_protect 15040 9539 -5501 (-36.58%) 364 220 -144 (-39.56%)
> > > > syar_connect_tcp_v6 15036 9535 -5501 (-36.59%) 360 216 -144 (-40.00%)
> > > > syar_connect_udp_v4 15039 9538 -5501 (-36.58%) 361 217 -144 (-39.89%)
> > > > syar_connect_connect4_protect4 24805 15833 -8972 (-36.17%) 756 480 -276 (-36.51%)
> > > > syar_lsm_file_open 167772 151813 -15959 (-9.51%) 1836 1667 -169 (-9.20%)
> > > > syar_namespace_create_new 14805 9304 -5501 (-37.16%) 353 209 -144 (-40.79%)
> > > > syar_python3_detect 17531 12030 -5501 (-31.38%) 391 247 -144 (-36.83%)
> > > > syar_ssh_post_fork 16412 10911 -5501 (-33.52%) 405 261 -144 (-35.56%)
> > > > syar_enter_execve 14728 9227 -5501 (-37.35%) 345 201 -144 (-41.74%)
> > > > syar_enter_execveat 14728 9227 -5501 (-37.35%) 345 201 -144 (-41.74%)
> > > > syar_exit_execve 16622 11121 -5501 (-33.09%) 376 232 -144 (-38.30%)
> > > > syar_exit_execveat 16622 11121 -5501 (-33.09%) 376 232 -144 (-38.30%)
> > > > syar_syscalls_kill 15288 9787 -5501 (-35.98%) 398 254 -144 (-36.18%)
> > > > syar_task_enter_pivot_root 14898 9397 -5501 (-36.92%) 357 213 -144 (-40.34%)
> > > > syar_syscalls_setreuid 16678 11177 -5501 (-32.98%) 429 285 -144 (-33.57%)
> > > > syar_syscalls_setuid 16678 11177 -5501 (-32.98%) 429 285 -144 (-33.57%)
> > > > syar_syscalls_process_vm_readv 14959 9458 -5501 (-36.77%) 364 220 -144 (-39.56%)
> > > > syar_syscalls_process_vm_writev 15757 10256 -5501 (-34.91%) 390 246 -144 (-36.92%)
> > > > do_uprobe 15519 10018 -5501 (-35.45%) 373 229 -144 (-38.61%)
> > > > edgewall 179715 55783 -123932 (-68.96%) 12607 3999 -8608 (-68.28%)
> > > > bictcp_state 7570 4131 -3439 (-45.43%) 496 269 -227 (-45.77%)
> > > > cubictcp_state 7570 4131 -3439 (-45.43%) 496 269 -227 (-45.77%)
> > > > tcp_rate_skb_delivered 447 272 -175 (-39.15%) 29 18 -11 (-37.93%)
> > > > kprobe__bbr_set_state 4566 2615 -1951 (-42.73%) 209 124 -85 (-40.67%)
> > > > kprobe__bictcp_state 4566 2615 -1951 (-42.73%) 209 124 -85 (-40.67%)
> > > > inet_sock_set_state 1501 1337 -164 (-10.93%) 93 85 -8 (-8.60%)
> > > > tcp_retransmit_skb 1145 981 -164 (-14.32%) 67 59 -8 (-11.94%)
> > > > tcp_retransmit_synack 1183 951 -232 (-19.61%) 67 55 -12 (-17.91%)
> > > > bpf_tcptuner 1459 1187 -272 (-18.64%) 99 80 -19 (-19.19%)
> > > > tw_egress 801 776 -25 (-3.12%) 69 66 -3 (-4.35%)
> > > > tw_ingress 795 770 -25 (-3.14%) 69 66 -3 (-4.35%)
> > > > ttls_tc_ingress 19025 19383 +358 (+1.88%) 470 465 -5 (-1.06%)
> > > > ttls_nat_egress 490 299 -191 (-38.98%) 33 20 -13 (-39.39%)
> > > > ttls_nat_ingress 448 285 -163 (-36.38%) 32 21 -11 (-34.38%)
> > > > tw_twfw_egress 511127 212071 -299056 (-58.51%) 16733 8504 -8229 (-49.18%)
> > > > tw_twfw_ingress 500095 212069 -288026 (-57.59%) 16223 8504 -7719 (-47.58%)
> > > > tw_twfw_tc_eg 511113 212064 -299049 (-58.51%) 16732 8504 -8228 (-49.18%)
> > > > tw_twfw_tc_in 500095 212069 -288026 (-57.59%) 16223 8504 -7719 (-47.58%)
> > > > tw_twfw_egress 12632 12435 -197 (-1.56%) 276 260 -16 (-5.80%)
> > > > tw_twfw_ingress 12631 12454 -177 (-1.40%) 278 261 -17 (-6.12%)
> > > > tw_twfw_tc_eg 12595 12435 -160 (-1.27%) 274 259 -15 (-5.47%)
> > > > tw_twfw_tc_in 12631 12454 -177 (-1.40%) 278 261 -17 (-6.12%)
> > > > tw_xdp_dump 266 209 -57 (-21.43%) 9 8 -1 (-11.11%)
> > > >
> > > > CILIUM
> > > > =========
> > > > File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
> > > > ------------- -------------------------------- --------- --------- ---------------- ---------- ---------- --------------
> > > > bpf_host.o cil_to_netdev 6047 4578 -1469 (-24.29%) 362 249 -113 (-31.22%)
> > > > bpf_host.o handle_lxc_traffic 2227 1585 -642 (-28.83%) 156 103 -53 (-33.97%)
> > > > bpf_host.o tail_handle_ipv4_from_netdev 2244 1458 -786 (-35.03%) 163 106 -57 (-34.97%)
> > > > bpf_host.o tail_handle_nat_fwd_ipv4 21022 10479 -10543 (-50.15%) 1289 670 -619 (-48.02%)
> > > > bpf_host.o tail_handle_nat_fwd_ipv6 15433 11375 -4058 (-26.29%) 905 643 -262 (-28.95%)
> > > > bpf_host.o tail_ipv4_host_policy_ingress 2219 1367 -852 (-38.40%) 161 96 -65 (-40.37%)
> > > > bpf_host.o tail_nodeport_nat_egress_ipv4 22460 19862 -2598 (-11.57%) 1469 1293 -176 (-11.98%)
> > > > bpf_host.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
> > > > bpf_host.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
> > > > bpf_host.o tail_nodeport_nat_ipv6_egress 3702 3542 -160 (-4.32%) 215 205 -10 (-4.65%)
> > > > bpf_lxc.o tail_handle_nat_fwd_ipv4 21022 10479 -10543 (-50.15%) 1289 670 -619 (-48.02%)
> > > > bpf_lxc.o tail_handle_nat_fwd_ipv6 15433 11375 -4058 (-26.29%) 905 643 -262 (-28.95%)
> > > > bpf_lxc.o tail_ipv4_ct_egress 5073 3374 -1699 (-33.49%) 262 172 -90 (-34.35%)
> > > > bpf_lxc.o tail_ipv4_ct_ingress 5093 3385 -1708 (-33.54%) 262 172 -90 (-34.35%)
> > > > bpf_lxc.o tail_ipv4_ct_ingress_policy_only 5093 3385 -1708 (-33.54%) 262 172 -90 (-34.35%)
> > > > bpf_lxc.o tail_ipv6_ct_egress 4593 3878 -715 (-15.57%) 194 151 -43 (-22.16%)
> > > > bpf_lxc.o tail_ipv6_ct_ingress 4606 3891 -715 (-15.52%) 194 151 -43 (-22.16%)
> > > > bpf_lxc.o tail_ipv6_ct_ingress_policy_only 4606 3891 -715 (-15.52%) 194 151 -43 (-22.16%)
> > > > bpf_lxc.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
> > > > bpf_lxc.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
> > > > bpf_overlay.o tail_handle_nat_fwd_ipv4 20524 10114 -10410 (-50.72%) 1271 638 -633 (-49.80%)
> > > > bpf_overlay.o tail_nodeport_nat_egress_ipv4 22718 19490 -3228 (-14.21%) 1475 1275 -200 (-13.56%)
> > > > bpf_overlay.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%)
> > > > bpf_overlay.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%)
> > > > bpf_overlay.o tail_nodeport_nat_ipv6_egress 3638 3548 -90 (-2.47%) 209 203 -6 (-2.87%)
> > > > bpf_overlay.o tail_rev_nodeport_lb4 4368 3820 -548 (-12.55%) 248 215 -33 (-13.31%)
> > > > bpf_overlay.o tail_rev_nodeport_lb6 2867 2428 -439 (-15.31%) 167 140 -27 (-16.17%)
> > > > bpf_sock.o cil_sock6_connect 1718 1703 -15 (-0.87%) 100 99 -1 (-1.00%)
> > > > bpf_xdp.o tail_handle_nat_fwd_ipv4 12917 12443 -474 (-3.67%) 875 849 -26 (-2.97%)
> > > > bpf_xdp.o tail_handle_nat_fwd_ipv6 13515 13264 -251 (-1.86%) 715 702 -13 (-1.82%)
> > > > bpf_xdp.o tail_lb_ipv4 39492 36367 -3125 (-7.91%) 2430 2251 -179 (-7.37%)
> > > > bpf_xdp.o tail_lb_ipv6 80441 78058 -2383 (-2.96%) 3647 3523 -124 (-3.40%)
> > > > bpf_xdp.o tail_nodeport_ipv6_dsr 1038 901 -137 (-13.20%) 61 55 -6 (-9.84%)
> > > > bpf_xdp.o tail_nodeport_nat_egress_ipv4 13027 12096 -931 (-7.15%) 868 809 -59 (-6.80%)
> > > > bpf_xdp.o tail_nodeport_nat_ingress_ipv4 7617 5900 -1717 (-22.54%) 522 413 -109 (-20.88%)
> > > > bpf_xdp.o tail_nodeport_nat_ingress_ipv6 7575 7395 -180 (-2.38%) 383 374 -9 (-2.35%)
> > > > bpf_xdp.o tail_rev_nodeport_lb4 6808 6739 -69 (-1.01%) 403 396 -7 (-1.74%)
> > > > bpf_xdp.o tail_rev_nodeport_lb6 16173 15847 -326 (-2.02%) 1010 990 -20 (-1.98%)
> > > >
> > >
> > > So I also want to mention that while I did spot check a few programs
> > > (not the biggest ones) and they did seem to have correct verification
> > > flow, I obviously can't easily validate verifier log_level=2 logs for
> > > all of the changes above, especially those multi-thousand state
> > > programs. I'd really appreciate someone from Isovalent/Cilium to do
> > > some checking of the Cilium program or two for sanity, just in case.
> > > Thanks!
> >
> > fyi, I was curious so tried that on top of tetragon programs,
> > seems up and down, but verification time is mostly lower ;-)
> >
>
> Nice! Can you please regenerate results and sort by either insn_diff
> (absolute difference, not percentage), or states_diff? It would be
> easier to see top10 improvement and regression that way. Percentages
> by themselves can be misleading.
>
> Oh, and peak states are probably not that useful, so maybe just use
> `-e file,prog,duration,insns,states -s insns_diff`?
$ veristat --compare ./veristat.old ./veristat.new -e file,prog,duration,insns,states --sort insns_diff
File Program Duration (us) (A) Duration (us) (B) Duration (us) (DIFF) Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
------------------------------ ----------------------------- ----------------- ----------------- -------------------- --------- --------- ---------------- ---------- ---------- ---------------
bpf_generic_kprobe_v61.o generic_kprobe_process_event4 11141 12815 +1674 (+15.03%) 58981 74350 +15369 (+26.06%) 1292 1522 +230 (+17.80%)
bpf_multi_kprobe_v61.o generic_kprobe_process_event4 14745 14029 -716 (-4.86%) 58981 74350 +15369 (+26.06%) 1292 1522 +230 (+17.80%)
bpf_generic_kprobe_v53.o generic_kprobe_process_event4 106100 111486 +5386 (+5.08%) 296244 308555 +12311 (+4.16%) 16249 16386 +137 (+0.84%)
bpf_multi_kprobe_v53.o generic_kprobe_process_event4 121800 111903 -9897 (-8.13%) 296244 308555 +12311 (+4.16%) 16249 16386 +137 (+0.84%)
bpf_generic_kprobe_v61.o generic_kprobe_process_event3 13016 15029 +2013 (+15.47%) 68447 75715 +7268 (+10.62%) 1477 1565 +88 (+5.96%)
bpf_multi_kprobe_v61.o generic_kprobe_process_event3 14824 13829 -995 (-6.71%) 68447 75715 +7268 (+10.62%) 1477 1565 +88 (+5.96%)
bpf_generic_kprobe_v61.o generic_kprobe_process_event1 12683 14576 +1893 (+14.93%) 68450 75716 +7266 (+10.62%) 1477 1566 +89 (+6.03%)
bpf_multi_kprobe_v61.o generic_kprobe_process_event1 16763 13670 -3093 (-18.45%) 68450 75716 +7266 (+10.62%) 1477 1566 +89 (+6.03%)
bpf_generic_kprobe_v61.o generic_kprobe_process_event2 12822 14709 +1887 (+14.72%) 68450 75715 +7265 (+10.61%) 1477 1566 +89 (+6.03%)
bpf_multi_kprobe_v61.o generic_kprobe_process_event2 14321 14000 -321 (-2.24%) 68450 75715 +7265 (+10.61%) 1477 1566 +89 (+6.03%)
bpf_generic_kprobe_v53.o generic_kprobe_process_event1 108349 106105 -2244 (-2.07%) 313458 315263 +1805 (+0.58%) 16524 16544 +20 (+0.12%)
bpf_generic_kprobe_v53.o generic_kprobe_process_event2 109991 105951 -4040 (-3.67%) 313458 315263 +1805 (+0.58%) 16524 16544 +20 (+0.12%)
bpf_generic_kprobe_v53.o generic_kprobe_process_event3 110279 109525 -754 (-0.68%) 313455 315260 +1805 (+0.58%) 16524 16544 +20 (+0.12%)
bpf_multi_kprobe_v53.o generic_kprobe_process_event1 132058 106791 -25267 (-19.13%) 313458 315263 +1805 (+0.58%) 16524 16544 +20 (+0.12%)
bpf_multi_kprobe_v53.o generic_kprobe_process_event2 122505 106459 -16046 (-13.10%) 313458 315263 +1805 (+0.58%) 16524 16544 +20 (+0.12%)
bpf_multi_kprobe_v53.o generic_kprobe_process_event3 127258 106633 -20625 (-16.21%) 313455 315260 +1805 (+0.58%) 16524 16544 +20 (+0.12%)
bpf_cgroup_mkdir.o tg_tp_cgrp_mkdir 206 190 -16 (-7.77%) 581 581 +0 (+0.00%) 24 24 +0 (+0.00%)
bpf_cgroup_release.o tg_tp_cgrp_release 114 104 -10 (-8.77%) 381 381 +0 (+0.00%) 13 13 +0 (+0.00%)
bpf_cgroup_rmdir.o tg_tp_cgrp_rmdir 126 121 -5 (-3.97%) 381 381 +0 (+0.00%) 13 13 +0 (+0.00%)
bpf_execve_bprm_commit_creds.o tg_kp_bprm_committing_creds 100 95 -5 (-5.00%) 163 163 +0 (+0.00%) 14 14 +0 (+0.00%)
bpf_execve_event.o execve_send 93 57 -36 (-38.71%) 82 82 +0 (+0.00%) 6 6 +0 (+0.00%)
bpf_execve_event_v53.o execve_send 52 54 +2 (+3.85%) 105 105 +0 (+0.00%) 5 5 +0 (+0.00%)
bpf_execve_event_v61.o execve_send 66 69 +3 (+4.55%) 105 105 +0 (+0.00%) 5 5 +0 (+0.00%)
bpf_exit.o event_exit 65 53 -12 (-18.46%) 94 94 +0 (+0.00%) 8 8 +0 (+0.00%)
bpf_fork.o event_wake_up_new_task 179 209 +30 (+16.76%) 514 514 +0 (+0.00%) 30 30 +0 (+0.00%)
bpf_generic_kprobe.o generic_fmodret_override 67 70 +3 (+4.48%) 18 18 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_actions 2386 1893 -493 (-20.66%) 6746 6746 +0 (+0.00%) 287 287 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_event 302 306 +4 (+1.32%) 580 580 +0 (+0.00%) 47 47 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_filter_arg1 2679 2464 -215 (-8.03%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_filter_arg2 2487 2777 +290 (+11.66%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_filter_arg3 2905 2620 -285 (-9.81%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_filter_arg4 2834 2706 -128 (-4.52%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_filter_arg5 2771 2621 -150 (-5.41%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_output 44 41 -3 (-6.82%) 29 29 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_kprobe.o generic_kprobe_override 40 39 -1 (-2.50%) 20 20 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_kprobe_v53.o generic_fmodret_override 64 66 +2 (+3.12%) 18 18 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_kprobe_v53.o generic_kprobe_actions 23258 14115 -9143 (-39.31%) 42545 42545 +0 (+0.00%) 1434 1434 +0 (+0.00%)
bpf_generic_kprobe_v53.o generic_kprobe_event 298 303 +5 (+1.68%) 583 583 +0 (+0.00%) 47 47 +0 (+0.00%)
bpf_generic_kprobe_v53.o generic_kprobe_output 119 148 +29 (+24.37%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_generic_kprobe_v53.o generic_kprobe_override 38 39 +1 (+2.63%) 20 20 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_kprobe_v61.o generic_fmodret_override 94 89 -5 (-5.32%) 18 18 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_kprobe_v61.o generic_kprobe_actions 15903 15072 -831 (-5.23%) 42545 42545 +0 (+0.00%) 1434 1434 +0 (+0.00%)
bpf_generic_kprobe_v61.o generic_kprobe_event 303 340 +37 (+12.21%) 583 583 +0 (+0.00%) 47 47 +0 (+0.00%)
bpf_generic_kprobe_v61.o generic_kprobe_output 153 149 -4 (-2.61%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_generic_kprobe_v61.o generic_kprobe_override 56 51 -5 (-8.93%) 20 20 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_actions 2259 1998 -261 (-11.55%) 6692 6692 +0 (+0.00%) 295 295 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_arg1 2523 2569 +46 (+1.82%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_arg2 2853 2692 -161 (-5.64%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_arg3 2522 2902 +380 (+15.07%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_arg4 2538 2837 +299 (+11.78%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_arg5 2598 2640 +42 (+1.62%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_event 691 617 -74 (-10.71%) 1487 1487 +0 (+0.00%) 92 92 +0 (+0.00%)
bpf_generic_tracepoint.o generic_tracepoint_output 41 36 -5 (-12.20%) 29 29 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_tracepoint_v53.o generic_tracepoint_actions 15139 14536 -603 (-3.98%) 41191 41191 +0 (+0.00%) 1397 1397 +0 (+0.00%)
bpf_generic_tracepoint_v53.o generic_tracepoint_event 581 591 +10 (+1.72%) 1490 1490 +0 (+0.00%) 92 92 +0 (+0.00%)
bpf_generic_tracepoint_v53.o generic_tracepoint_output 136 136 +0 (+0.00%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_generic_tracepoint_v61.o generic_tracepoint_actions 16298 14731 -1567 (-9.61%) 41191 41191 +0 (+0.00%) 1397 1397 +0 (+0.00%)
bpf_generic_tracepoint_v61.o generic_tracepoint_event 555 531 -24 (-4.32%) 1490 1490 +0 (+0.00%) 92 92 +0 (+0.00%)
bpf_generic_tracepoint_v61.o generic_tracepoint_output 120 141 +21 (+17.50%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_actions 1767 1928 +161 (+9.11%) 5702 5702 +0 (+0.00%) 248 248 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_event 232 207 -25 (-10.78%) 429 429 +0 (+0.00%) 33 33 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_filter_arg1 2764 2832 +68 (+2.46%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_filter_arg2 2639 2675 +36 (+1.36%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_filter_arg3 3875 2529 -1346 (-34.74%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_filter_arg4 2646 2540 -106 (-4.01%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_filter_arg5 2510 2674 +164 (+6.53%) 6966 6966 +0 (+0.00%) 451 451 +0 (+0.00%)
bpf_generic_uprobe.o generic_uprobe_output 41 39 -2 (-4.88%) 29 29 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_uprobe_v53.o generic_uprobe_actions 14216 14310 +94 (+0.66%) 39443 39443 +0 (+0.00%) 1336 1336 +0 (+0.00%)
bpf_generic_uprobe_v53.o generic_uprobe_event 236 223 -13 (-5.51%) 433 433 +0 (+0.00%) 33 33 +0 (+0.00%)
bpf_generic_uprobe_v53.o generic_uprobe_output 148 144 -4 (-2.70%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_generic_uprobe_v61.o generic_uprobe_actions 14050 14958 +908 (+6.46%) 39443 39443 +0 (+0.00%) 1336 1336 +0 (+0.00%)
bpf_generic_uprobe_v61.o generic_uprobe_event 241 309 +68 (+28.22%) 433 433 +0 (+0.00%) 33 33 +0 (+0.00%)
bpf_generic_uprobe_v61.o generic_uprobe_output 138 146 +8 (+5.80%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_globals.o read_globals_test 0 0 +0 (+0.00%) 0 0 +0 (+0.00%) 0 0 +0 (+0.00%)
bpf_killer.o killer 27 28 +1 (+3.70%) 33 33 +0 (+0.00%) 3 3 +0 (+0.00%)
bpf_loader.o loader_kprobe 84 82 -2 (-2.38%) 144 144 +0 (+0.00%) 10 10 +0 (+0.00%)
bpf_lseek.o test_lseek 54 41 -13 (-24.07%) 67 67 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_multi_killer.o killer 22 22 +0 (+0.00%) 33 33 +0 (+0.00%) 3 3 +0 (+0.00%)
bpf_multi_kprobe_v53.o generic_fmodret_override 108 73 -35 (-32.41%) 18 18 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_multi_kprobe_v53.o generic_kprobe_actions 29346 14095 -15251 (-51.97%) 42545 42545 +0 (+0.00%) 1434 1434 +0 (+0.00%)
bpf_multi_kprobe_v53.o generic_kprobe_event 339 345 +6 (+1.77%) 585 585 +0 (+0.00%) 48 48 +0 (+0.00%)
bpf_multi_kprobe_v53.o generic_kprobe_output 185 128 -57 (-30.81%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_multi_kprobe_v53.o generic_kprobe_override 62 41 -21 (-33.87%) 20 20 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_multi_kprobe_v61.o generic_fmodret_override 71 91 +20 (+28.17%) 18 18 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_multi_kprobe_v61.o generic_kprobe_actions 16654 15088 -1566 (-9.40%) 42545 42545 +0 (+0.00%) 1434 1434 +0 (+0.00%)
bpf_multi_kprobe_v61.o generic_kprobe_event 517 278 -239 (-46.23%) 585 585 +0 (+0.00%) 48 48 +0 (+0.00%)
bpf_multi_kprobe_v61.o generic_kprobe_output 153 150 -3 (-1.96%) 252 252 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_multi_kprobe_v61.o generic_kprobe_override 40 51 +11 (+27.50%) 20 20 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_generic_tracepoint_v53.o generic_tracepoint_event4 97822 89913 -7909 (-8.09%) 215757 215704 -53 (-0.02%) 14951 14942 -9 (-0.06%)
bpf_generic_uprobe_v53.o generic_uprobe_process_event4 99896 96233 -3663 (-3.67%) 215757 215704 -53 (-0.02%) 14951 14942 -9 (-0.06%)
bpf_generic_kprobe.o generic_kprobe_process_event3 7581 7024 -557 (-7.35%) 19779 19680 -99 (-0.50%) 1348 1338 -10 (-0.74%)
bpf_generic_tracepoint.o generic_tracepoint_event3 7296 7587 +291 (+3.99%) 19779 19680 -99 (-0.50%) 1348 1338 -10 (-0.74%)
bpf_generic_tracepoint_v53.o generic_tracepoint_event0 94250 96057 +1807 (+1.92%) 215685 215586 -99 (-0.05%) 14954 14938 -16 (-0.11%)
bpf_generic_tracepoint_v53.o generic_tracepoint_event1 93947 95801 +1854 (+1.97%) 215701 215602 -99 (-0.05%) 14955 14941 -14 (-0.09%)
bpf_generic_tracepoint_v53.o generic_tracepoint_event2 96306 95407 -899 (-0.93%) 215701 215602 -99 (-0.05%) 14955 14941 -14 (-0.09%)
bpf_generic_tracepoint_v53.o generic_tracepoint_event3 97718 90734 -6984 (-7.15%) 215698 215599 -99 (-0.05%) 14955 14941 -14 (-0.09%)
bpf_generic_uprobe.o generic_uprobe_process_event3 8127 6999 -1128 (-13.88%) 19779 19680 -99 (-0.50%) 1348 1338 -10 (-0.74%)
bpf_generic_uprobe_v53.o generic_uprobe_process_event1 104517 90211 -14306 (-13.69%) 215701 215602 -99 (-0.05%) 14955 14941 -14 (-0.09%)
bpf_generic_uprobe_v53.o generic_uprobe_process_event2 101025 90027 -10998 (-10.89%) 215701 215602 -99 (-0.05%) 14955 14941 -14 (-0.09%)
bpf_generic_uprobe_v53.o generic_uprobe_process_event3 99776 95596 -4180 (-4.19%) 215698 215599 -99 (-0.05%) 14955 14941 -14 (-0.09%)
bpf_generic_kprobe.o generic_kprobe_process_event1 7239 7468 +229 (+3.16%) 19782 19681 -101 (-0.51%) 1348 1339 -9 (-0.67%)
bpf_generic_tracepoint.o generic_tracepoint_event1 7347 9822 +2475 (+33.69%) 19782 19681 -101 (-0.51%) 1348 1339 -9 (-0.67%)
bpf_generic_uprobe.o generic_uprobe_process_event1 8326 8041 -285 (-3.42%) 19782 19681 -101 (-0.51%) 1348 1339 -9 (-0.67%)
bpf_generic_kprobe.o generic_kprobe_process_event2 7415 7691 +276 (+3.72%) 19782 19680 -102 (-0.52%) 1348 1339 -9 (-0.67%)
bpf_generic_kprobe.o generic_kprobe_process_event4 8016 7572 -444 (-5.54%) 19760 19658 -102 (-0.52%) 1355 1344 -11 (-0.81%)
bpf_generic_tracepoint.o generic_tracepoint_event2 7218 7804 +586 (+8.12%) 19782 19680 -102 (-0.52%) 1348 1339 -9 (-0.67%)
bpf_generic_tracepoint.o generic_tracepoint_event4 7215 8109 +894 (+12.39%) 19760 19658 -102 (-0.52%) 1355 1344 -11 (-0.81%)
bpf_generic_uprobe.o generic_uprobe_process_event2 8183 7016 -1167 (-14.26%) 19782 19680 -102 (-0.52%) 1348 1339 -9 (-0.67%)
bpf_generic_uprobe.o generic_uprobe_process_event4 8072 7185 -887 (-10.99%) 19760 19658 -102 (-0.52%) 1355 1344 -11 (-0.81%)
bpf_generic_tracepoint.o generic_tracepoint_event0 7566 8026 +460 (+6.08%) 20592 20479 -113 (-0.55%) 1421 1409 -12 (-0.84%)
bpf_generic_kprobe_v53.o generic_kprobe_process_event0 102334 101040 -1294 (-1.26%) 283295 283172 -123 (-0.04%) 16044 16033 -11 (-0.07%)
bpf_multi_kprobe_v53.o generic_kprobe_process_event0 113628 100702 -12926 (-11.38%) 283295 283172 -123 (-0.04%) 16044 16033 -11 (-0.07%)
bpf_generic_uprobe.o generic_uprobe_process_event0 7804 8154 +350 (+4.48%) 21063 20890 -173 (-0.82%) 1419 1400 -19 (-1.34%)
bpf_generic_retkprobe_v53.o generic_retkprobe_event 108357 105058 -3299 (-3.04%) 231680 231505 -175 (-0.08%) 16131 16113 -18 (-0.11%)
bpf_multi_retkprobe_v53.o generic_retkprobe_event 127625 110224 -17401 (-13.63%) 231631 231456 -175 (-0.08%) 16130 16112 -18 (-0.11%)
bpf_generic_retkprobe_v61.o generic_retkprobe_event 10694 11197 +503 (+4.70%) 24960 24775 -185 (-0.74%) 1854 1842 -12 (-0.65%)
bpf_generic_uprobe_v53.o generic_uprobe_process_event0 103254 90496 -12758 (-12.36%) 215852 215620 -232 (-0.11%) 14972 14952 -20 (-0.13%)
bpf_generic_uprobe_v61.o generic_uprobe_process_event0 2194 2133 -61 (-2.78%) 4395 4152 -243 (-5.53%) 329 312 -17 (-5.17%)
bpf_generic_tracepoint_v61.o generic_tracepoint_event4 2094 1910 -184 (-8.79%) 4396 4124 -272 (-6.19%) 323 304 -19 (-5.88%)
bpf_generic_uprobe_v61.o generic_uprobe_process_event4 1950 2031 +81 (+4.15%) 4396 4124 -272 (-6.19%) 323 304 -19 (-5.88%)
bpf_generic_retkprobe.o generic_retkprobe_event 11526 11239 -287 (-2.49%) 28282 28008 -274 (-0.97%) 1973 1949 -24 (-1.22%)
bpf_multi_retkprobe_v61.o generic_retkprobe_event 12110 9753 -2357 (-19.46%) 24404 24110 -294 (-1.20%) 1859 1841 -18 (-0.97%)
bpf_generic_kprobe_v53.o generic_kprobe_filter_arg1 25215 26076 +861 (+3.41%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_kprobe_v53.o generic_kprobe_filter_arg2 24813 24288 -525 (-2.12%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_kprobe_v53.o generic_kprobe_filter_arg3 26494 24362 -2132 (-8.05%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_kprobe_v53.o generic_kprobe_filter_arg4 24373 24041 -332 (-1.36%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_kprobe_v53.o generic_kprobe_filter_arg5 26265 24317 -1948 (-7.42%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_kprobe_v61.o generic_kprobe_filter_arg1 25870 24169 -1701 (-6.58%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_kprobe_v61.o generic_kprobe_filter_arg2 26667 24070 -2597 (-9.74%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_kprobe_v61.o generic_kprobe_filter_arg3 27248 24758 -2490 (-9.14%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_kprobe_v61.o generic_kprobe_filter_arg4 27483 26107 -1376 (-5.01%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_kprobe_v61.o generic_kprobe_filter_arg5 26764 26316 -448 (-1.67%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_tracepoint_v53.o generic_tracepoint_arg1 26569 23775 -2794 (-10.52%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_tracepoint_v53.o generic_tracepoint_arg2 26853 24057 -2796 (-10.41%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_tracepoint_v53.o generic_tracepoint_arg3 27067 24044 -3023 (-11.17%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_tracepoint_v53.o generic_tracepoint_arg4 24410 23953 -457 (-1.87%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_tracepoint_v53.o generic_tracepoint_arg5 30439 24792 -5647 (-18.55%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_tracepoint_v61.o generic_tracepoint_arg1 27534 23721 -3813 (-13.85%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_tracepoint_v61.o generic_tracepoint_arg2 28248 24052 -4196 (-14.85%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_tracepoint_v61.o generic_tracepoint_arg3 29118 24012 -5106 (-17.54%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_tracepoint_v61.o generic_tracepoint_arg4 33309 23915 -9394 (-28.20%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_tracepoint_v61.o generic_tracepoint_arg5 28057 24983 -3074 (-10.96%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_uprobe_v53.o generic_uprobe_filter_arg1 28012 26052 -1960 (-7.00%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_uprobe_v53.o generic_uprobe_filter_arg2 27759 26451 -1308 (-4.71%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_uprobe_v53.o generic_uprobe_filter_arg3 27301 25856 -1445 (-5.29%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_uprobe_v53.o generic_uprobe_filter_arg4 26331 26187 -144 (-0.55%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_uprobe_v53.o generic_uprobe_filter_arg5 27284 26122 -1162 (-4.26%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_uprobe_v61.o generic_uprobe_filter_arg1 30324 26943 -3381 (-11.15%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_uprobe_v61.o generic_uprobe_filter_arg2 26755 26758 +3 (+0.01%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_uprobe_v61.o generic_uprobe_filter_arg3 28337 27992 -345 (-1.22%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_uprobe_v61.o generic_uprobe_filter_arg4 26332 27308 +976 (+3.71%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_uprobe_v61.o generic_uprobe_filter_arg5 27209 26780 -429 (-1.58%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_multi_kprobe_v53.o generic_kprobe_filter_arg1 33490 23550 -9940 (-29.68%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_multi_kprobe_v53.o generic_kprobe_filter_arg2 42586 24318 -18268 (-42.90%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_multi_kprobe_v53.o generic_kprobe_filter_arg3 39256 24731 -14525 (-37.00%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_multi_kprobe_v53.o generic_kprobe_filter_arg4 41607 23955 -17652 (-42.43%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_multi_kprobe_v53.o generic_kprobe_filter_arg5 49382 24518 -24864 (-50.35%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_multi_kprobe_v61.o generic_kprobe_filter_arg1 41140 26793 -14347 (-34.87%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_multi_kprobe_v61.o generic_kprobe_filter_arg2 30326 26454 -3872 (-12.77%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_multi_kprobe_v61.o generic_kprobe_filter_arg3 38517 24452 -14065 (-36.52%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_multi_kprobe_v61.o generic_kprobe_filter_arg4 36157 24539 -11618 (-32.13%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_multi_kprobe_v61.o generic_kprobe_filter_arg5 40673 25657 -15016 (-36.92%) 91872 91575 -297 (-0.32%) 2910 2900 -10 (-0.34%)
bpf_generic_tracepoint_v61.o generic_tracepoint_event0 2128 2058 -70 (-3.29%) 4403 4100 -303 (-6.88%) 326 305 -21 (-6.44%)
bpf_generic_tracepoint_v61.o generic_tracepoint_event1 1982 2028 +46 (+2.32%) 4409 4106 -303 (-6.87%) 328 304 -24 (-7.32%)
bpf_generic_tracepoint_v61.o generic_tracepoint_event2 2357 2054 -303 (-12.86%) 4409 4106 -303 (-6.87%) 328 304 -24 (-7.32%)
bpf_generic_tracepoint_v61.o generic_tracepoint_event3 2018 1835 -183 (-9.07%) 4406 4103 -303 (-6.88%) 328 304 -24 (-7.32%)
bpf_generic_uprobe_v61.o generic_uprobe_process_event1 1885 1832 -53 (-2.81%) 4409 4106 -303 (-6.87%) 328 304 -24 (-7.32%)
bpf_generic_uprobe_v61.o generic_uprobe_process_event2 2775 1966 -809 (-29.15%) 4409 4106 -303 (-6.87%) 328 304 -24 (-7.32%)
bpf_generic_uprobe_v61.o generic_uprobe_process_event3 3237 2004 -1233 (-38.09%) 4406 4103 -303 (-6.88%) 328 304 -24 (-7.32%)
bpf_generic_kprobe.o generic_kprobe_process_event0 7817 7945 +128 (+1.64%) 21321 21001 -320 (-1.50%) 1440 1403 -37 (-2.57%)
bpf_execve_event.o event_execve 12147 12843 +696 (+5.73%) 35096 34723 -373 (-1.06%) 2278 2251 -27 (-1.19%)
bpf_execve_event_v61.o event_execve 6094 6059 -35 (-0.57%) 27456 26871 -585 (-2.13%) 671 636 -35 (-5.22%)
bpf_execve_event_v53.o event_execve 97457 98430 +973 (+1.00%) 245365 239363 -6002 (-2.45%) 15430 15334 -96 (-0.62%)
bpf_generic_kprobe_v53.o generic_kprobe_process_filter 57465 54691 -2774 (-4.83%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%)
bpf_generic_kprobe_v61.o generic_kprobe_process_filter 57674 51652 -6022 (-10.44%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%)
bpf_generic_tracepoint_v53.o generic_tracepoint_filter 64076 50012 -14064 (-21.95%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%)
bpf_generic_tracepoint_v61.o generic_tracepoint_filter 63620 50068 -13552 (-21.30%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%)
bpf_generic_uprobe_v53.o generic_uprobe_process_filter 65621 56496 -9125 (-13.91%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%)
bpf_generic_uprobe_v61.o generic_uprobe_process_filter 62774 56727 -6047 (-9.63%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%)
bpf_multi_kprobe_v53.o generic_kprobe_process_filter 73918 54826 -19092 (-25.83%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%)
bpf_multi_kprobe_v61.o generic_kprobe_process_filter 73994 54979 -19015 (-25.70%) 166600 158639 -7961 (-4.78%) 7263 6602 -661 (-9.10%)
bpf_generic_kprobe_v61.o generic_kprobe_process_event0 11184 10303 -881 (-7.88%) 58564 49822 -8742 (-14.93%) 1243 1108 -135 (-10.86%)
bpf_multi_kprobe_v61.o generic_kprobe_process_event0 17270 9818 -7452 (-43.15%) 58564 49822 -8742 (-14.93%) 1243 1108 -135 (-10.86%)
bpf_generic_kprobe.o generic_kprobe_process_filter 43093 31779 -11314 (-26.25%) 77948 66684 -11264 (-14.45%) 6048 5009 -1039 (-17.18%)
bpf_generic_tracepoint.o generic_tracepoint_filter 41153 33891 -7262 (-17.65%) 77948 66684 -11264 (-14.45%) 6048 5009 -1039 (-17.18%)
bpf_generic_uprobe.o generic_uprobe_process_filter 40999 31572 -9427 (-22.99%) 77948 66684 -11264 (-14.45%) 6048 5009 -1039 (-17.18%)
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
2023-10-31 5:03 ` [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states Andrii Nakryiko
@ 2023-11-09 15:20 ` Eduard Zingerman
2023-11-09 16:13 ` Alexei Starovoitov
0 siblings, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 15:20 UTC (permalink / raw)
To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team
On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> Instead of allocating and copying jump history each time we enqueue
> child verifier state, switch to a model where we use one common
> dynamically sized array of instruction jumps across all states.
>
> The key observation for proving this is correct is that jmp_history is
> only relevant while state is active, which means it either is a current
> state (and thus we are actively modifying jump history and no other
> state can interfere with us) or we are checkpointed state with some
> children still active (either enqueued or being current).
>
> In the latter case our portion of jump history is finalized and won't
> change or grow, so as long as we keep it immutable until the state is
> finalized, we are good.
>
> Now, when state is finalized and is put into state hash for potentially
> future pruning lookups, jump history is not used anymore. This is
> because jump history is only used by precision marking logic, and we
> never modify precision markings for finalized states.
>
> So, instead of each state having its own small jump history, we keep
> a global dynamically-sized jump history, where each state in current DFS
> path from root to active state remembers its portion of jump history.
> Current state can append to this history, but cannot modify any of its
> parent histories.
>
> Because the jmp_history array can be grown through realloc, states don't
> keep pointers, they instead maintain two indexes [start, end) into
> global jump history array. End is exclusive index, so start == end means
> there is no relevant jump history.
>
> This should eliminate a lot of allocations and minimize overall memory
> usage (but I haven't benchmarked on real hardware, and QEMU benchmarking
> is too noisy).
>
> Also, in the next patch we'll extend jump history to maintain additional
> markings for some instructions even if there was no jump, so in
> preparation for that call this thing a more generic "instruction history".
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Nitpick: could you please add a comment somewhere in the code
(is_state_visited? pop_stack?) saying something like this:
states in the env->head happen to be sorted by insn_hist_end in
descending order, so popping next state for verification poses no
risk of overwriting history relevant for states remaining in
env->head.
Side note: this change would make it harder to change states traversal
order to something other than DFS, should we chose to do so.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
[...]
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking
2023-10-31 5:03 ` [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking Andrii Nakryiko
@ 2023-11-09 15:20 ` Eduard Zingerman
2023-11-09 17:20 ` Andrii Nakryiko
0 siblings, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 15:20 UTC (permalink / raw)
To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team, Tao Lyu
On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
All makes sense, a few nitpicks below.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
[...]
> +/* instruction history flags, used in bpf_insn_hist_entry.flags field */
> +enum {
> + /* instruction references stack slot through PTR_TO_STACK register;
> + * we also store stack's frame number in lower 3 bits (MAX_CALL_FRAMES is 8)
> + * and accessed stack slot's index in next 6 bits (MAX_BPF_STACK is 512,
> + * 8 bytes per slot, so slot index (spi) is [0, 63])
> + */
> + INSN_F_FRAMENO_MASK = 0x7, /* 3 bits */
> +
> + INSN_F_SPI_MASK = 0x3f, /* 6 bits */
> + INSN_F_SPI_SHIFT = 3, /* shifted 3 bits to the left */
> +
> + INSN_F_STACK_ACCESS = BIT(9), /* we need 10 bits total */
> +};
> +
> +static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES);
> +static_assert(INSN_F_SPI_MASK + 1 >= MAX_BPF_STACK / 8);
> +
> struct bpf_insn_hist_entry {
> - u32 prev_idx;
> u32 idx;
> + /* insn idx can't be bigger than 1 million */
> + u32 prev_idx : 22;
> + /* special flags, e.g., whether insn is doing register stack spill/load */
> + u32 flags : 10;
> };
Nitpick: maybe use separate bit-fields for frameno and spi instead of
flags? Or add dedicated accessor functions?
>
> -#define MAX_CALL_FRAMES 8
> /* Maximum number of register states that can exist at once */
> #define BPF_ID_MAP_SIZE ((MAX_BPF_REG + MAX_BPF_STACK / BPF_REG_SIZE) * MAX_CALL_FRAMES)
> struct bpf_verifier_state {
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 2905ce2e8b34..fbb779583d52 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3479,14 +3479,20 @@ static bool is_jmp_point(struct bpf_verifier_env *env, int insn_idx)
> }
>
> /* for any branch, call, exit record the history of jmps in the given state */
> -static int push_jmp_history(struct bpf_verifier_env *env,
> - struct bpf_verifier_state *cur)
> +static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
> + int insn_flags)
> {
> struct bpf_insn_hist_entry *p;
> size_t alloc_size;
>
> - if (!is_jmp_point(env, env->insn_idx))
> + /* combine instruction flags if we already recorded this instruction */
> + if (cur->insn_hist_end > cur->insn_hist_start &&
> + (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
> + p->idx == env->insn_idx &&
> + p->prev_idx == env->prev_insn_idx) {
> + p->flags |= insn_flags;
Nitpick: maybe add an assert to check that frameno/spi are not or'ed?
[...]
> +static struct bpf_insn_hist_entry *get_hist_insn_entry(struct bpf_verifier_env *env,
> + u32 hist_start, u32 hist_end, int insn_idx)
Nitpick: maybe rename 'hist_insn' to 'insn_hist', i.e. 'get_insn_hist_entry'?
[...]
> @@ -4713,9 +4711,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>
> /* Mark slots affected by this stack write. */
> for (i = 0; i < size; i++)
> - state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] =
> - type;
> + state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = type;
> + insn_flags = 0; /* not a register spill */
> }
> +
> + if (insn_flags)
> + return push_insn_history(env, env->cur_state, insn_flags);
Maybe add a check that insn is BPF_ST or BPF_STX here?
Only these cases are supported by backtrack_insn() while
check_mem_access() is called from multiple places.
> return 0;
> }
>
> @@ -4908,6 +4909,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
> int i, slot = -off - 1, spi = slot / BPF_REG_SIZE;
> struct bpf_reg_state *reg;
> u8 *stype, type;
> + int insn_flags = INSN_F_STACK_ACCESS | (spi << INSN_F_SPI_SHIFT) | reg_state->frameno;
>
> stype = reg_state->stack[spi].slot_type;
> reg = ®_state->stack[spi].spilled_ptr;
> @@ -4953,12 +4955,10 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
> return -EACCES;
> }
> mark_reg_unknown(env, state->regs, dst_regno);
> + insn_flags = 0; /* not restoring original register state */
> }
> state->regs[dst_regno].live |= REG_LIVE_WRITTEN;
> - return 0;
> - }
> -
> - if (dst_regno >= 0) {
> + } else if (dst_regno >= 0) {
> /* restore register state from stack */
> copy_register_state(&state->regs[dst_regno], reg);
> /* mark reg as written since spilled pointer state likely
> @@ -4994,7 +4994,10 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
> mark_reg_read(env, reg, reg->parent, REG_LIVE_READ64);
> if (dst_regno >= 0)
> mark_reg_stack_read(env, reg_state, off, off + size, dst_regno);
> + insn_flags = 0; /* we are not restoring spilled register */
> }
> + if (insn_flags)
> + return push_insn_history(env, env->cur_state, insn_flags);
> return 0;
> }
>
> @@ -7125,7 +7128,6 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> BPF_SIZE(insn->code), BPF_WRITE, -1, true, false);
> if (err)
> return err;
> -
> return 0;
> }
>
> @@ -17001,7 +17003,8 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
> * the precision needs to be propagated back in
> * the current state.
> */
> - err = err ? : push_jmp_history(env, cur);
> + if (is_jmp_point(env, env->insn_idx))
> + err = err ? : push_insn_history(env, cur, 0);
> err = err ? : propagate_precision(env, &sl->state);
> if (err)
> return err;
> @@ -17265,7 +17268,7 @@ static int do_check(struct bpf_verifier_env *env)
> }
>
> if (is_jmp_point(env, env->insn_idx)) {
> - err = push_jmp_history(env, state);
> + err = push_insn_history(env, state, 0);
> if (err)
> return err;
> }
> diff --git a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
> index db6b3143338b..88c4207c6b4c 100644
> --- a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
> +++ b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
> @@ -487,7 +487,24 @@ __success __log_level(2)
> * so we won't be able to mark stack slot fp-8 as precise, and so will
> * fallback to forcing all as precise
> */
> -__msg("mark_precise: frame0: falling back to forcing all scalars precise")
> +__msg("10: (0f) r1 += r7")
> +__msg("mark_precise: frame0: last_idx 10 first_idx 7 subseq_idx -1")
> +__msg("mark_precise: frame0: regs=r7 stack= before 9: (bf) r1 = r8")
> +__msg("mark_precise: frame0: regs=r7 stack= before 8: (27) r7 *= 4")
> +__msg("mark_precise: frame0: regs=r7 stack= before 7: (79) r7 = *(u64 *)(r10 -8)")
> +__msg("mark_precise: frame0: parent state regs= stack=-8: R0_w=2 R6_w=1 R8_rw=map_value(off=0,ks=4,vs=16,imm=0) R10=fp0 fp-8_rw=P1")
> +__msg("mark_precise: frame0: last_idx 18 first_idx 0 subseq_idx 7")
> +__msg("mark_precise: frame0: regs= stack=-8 before 18: (95) exit")
> +__msg("mark_precise: frame1: regs= stack= before 17: (0f) r0 += r2")
> +__msg("mark_precise: frame1: regs= stack= before 16: (79) r2 = *(u64 *)(r1 +0)")
> +__msg("mark_precise: frame1: regs= stack= before 15: (79) r0 = *(u64 *)(r10 -16)")
> +__msg("mark_precise: frame1: regs= stack= before 14: (7b) *(u64 *)(r10 -16) = r2")
> +__msg("mark_precise: frame1: regs= stack= before 13: (7b) *(u64 *)(r1 +0) = r2")
> +__msg("mark_precise: frame1: regs=r2 stack= before 6: (85) call pc+6")
> +__msg("mark_precise: frame0: regs=r2 stack= before 5: (bf) r2 = r6")
> +__msg("mark_precise: frame0: regs=r6 stack= before 4: (07) r1 += -8")
> +__msg("mark_precise: frame0: regs=r6 stack= before 3: (bf) r1 = r10")
> +__msg("mark_precise: frame0: regs=r6 stack= before 2: (b7) r6 = 1")
> __naked int subprog_spill_into_parent_stack_slot_precise(void)
> {
> asm volatile (
> @@ -522,14 +539,68 @@ __naked int subprog_spill_into_parent_stack_slot_precise(void)
> );
> }
>
> -__naked __noinline __used
> -static __u64 subprog_with_checkpoint(void)
> +SEC("?raw_tp")
> +__success __log_level(2)
> +__msg("17: (0f) r1 += r0")
> +__msg("mark_precise: frame0: last_idx 17 first_idx 0 subseq_idx -1")
> +__msg("mark_precise: frame0: regs=r0 stack= before 16: (bf) r1 = r7")
> +__msg("mark_precise: frame0: regs=r0 stack= before 15: (27) r0 *= 4")
> +__msg("mark_precise: frame0: regs=r0 stack= before 14: (79) r0 = *(u64 *)(r10 -16)")
> +__msg("mark_precise: frame0: regs= stack=-16 before 13: (7b) *(u64 *)(r7 -8) = r0")
> +__msg("mark_precise: frame0: regs=r0 stack= before 12: (79) r0 = *(u64 *)(r8 +16)")
> +__msg("mark_precise: frame0: regs= stack=-16 before 11: (7b) *(u64 *)(r8 +16) = r0")
> +__msg("mark_precise: frame0: regs=r0 stack= before 10: (79) r0 = *(u64 *)(r7 -8)")
> +__msg("mark_precise: frame0: regs= stack=-16 before 9: (7b) *(u64 *)(r10 -16) = r0")
> +__msg("mark_precise: frame0: regs=r0 stack= before 8: (07) r8 += -32")
> +__msg("mark_precise: frame0: regs=r0 stack= before 7: (bf) r8 = r10")
> +__msg("mark_precise: frame0: regs=r0 stack= before 6: (07) r7 += -8")
> +__msg("mark_precise: frame0: regs=r0 stack= before 5: (bf) r7 = r10")
> +__msg("mark_precise: frame0: regs=r0 stack= before 21: (95) exit")
> +__msg("mark_precise: frame1: regs=r0 stack= before 20: (bf) r0 = r1")
> +__msg("mark_precise: frame1: regs=r1 stack= before 4: (85) call pc+15")
> +__msg("mark_precise: frame0: regs=r1 stack= before 3: (bf) r1 = r6")
> +__msg("mark_precise: frame0: regs=r6 stack= before 2: (b7) r6 = 1")
> +__naked int stack_slot_aliases_precision(void)
> {
> asm volatile (
> - "r0 = 0;"
> - /* guaranteed checkpoint if BPF_F_TEST_STATE_FREQ is used */
> - "goto +0;"
> + "r6 = 1;"
> + /* pass r6 through r1 into subprog to get it back as r0;
> + * this whole chain will have to be marked as precise later
> + */
> + "r1 = r6;"
> + "call identity_subprog;"
> + /* let's setup two registers that are aliased to r10 */
> + "r7 = r10;"
> + "r7 += -8;" /* r7 = r10 - 8 */
> + "r8 = r10;"
> + "r8 += -32;" /* r8 = r10 - 32 */
> + /* now spill subprog's return value (a r6 -> r1 -> r0 chain)
> + * a few times through different stack pointer regs, making
> + * sure to use r10, r7, and r8 both in LDX and STX insns, and
> + * *importantly* also using a combination of const var_off and
> + * insn->off to validate that we record final stack slot
> + * correctly, instead of relying on just insn->off derivation,
> + * which is only valid for r10-based stack offset
> + */
> + "*(u64 *)(r10 - 16) = r0;"
> + "r0 = *(u64 *)(r7 - 8);" /* r7 - 8 == r10 - 16 */
> + "*(u64 *)(r8 + 16) = r0;" /* r8 + 16 = r10 - 16 */
> + "r0 = *(u64 *)(r8 + 16);"
> + "*(u64 *)(r7 - 8) = r0;"
> + "r0 = *(u64 *)(r10 - 16);"
> + /* get ready to use r0 as an index into array to force precision */
> + "r0 *= 4;"
> + "r1 = %[vals];"
> + /* here r0->r1->r6 chain is forced to be precise and has to be
> + * propagated back to the beginning, including through the
> + * subprog call and all the stack spills and loads
> + */
> + "r1 += r0;"
> + "r0 = *(u32 *)(r1 + 0);"
> "exit;"
> + :
> + : __imm_ptr(vals)
> + : __clobber_common, "r6"
> );
> }
>
> diff --git a/tools/testing/selftests/bpf/verifier/precise.c b/tools/testing/selftests/bpf/verifier/precise.c
> index 0d84dd1f38b6..8a2ff81d8350 100644
> --- a/tools/testing/selftests/bpf/verifier/precise.c
> +++ b/tools/testing/selftests/bpf/verifier/precise.c
> @@ -140,10 +140,11 @@
> .result = REJECT,
> },
> {
> - "precise: ST insn causing spi > allocated_stack",
> + "precise: ST zero to stack insn is supported",
> .insns = {
> BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
> BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 123, 0),
> + /* not a register spill, so we stop precision propagation for R4 here */
> BPF_ST_MEM(BPF_DW, BPF_REG_3, -8, 0),
> BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8),
> BPF_MOV64_IMM(BPF_REG_0, -1),
> @@ -157,11 +158,11 @@
> mark_precise: frame0: last_idx 4 first_idx 2\
> mark_precise: frame0: regs=r4 stack= before 4\
> mark_precise: frame0: regs=r4 stack= before 3\
> - mark_precise: frame0: regs= stack=-8 before 2\
> - mark_precise: frame0: falling back to forcing all scalars precise\
> - force_precise: frame0: forcing r0 to be precise\
> mark_precise: frame0: last_idx 5 first_idx 5\
> - mark_precise: frame0: parent state regs= stack=:",
> + mark_precise: frame0: parent state regs=r0 stack=:\
> + mark_precise: frame0: last_idx 4 first_idx 2\
> + mark_precise: frame0: regs=r0 stack= before 4\
> + 5: R0=-1 R4=0",
> .result = VERBOSE_ACCEPT,
> .retval = -1,
> },
> @@ -169,6 +170,8 @@
> "precise: STX insn causing spi > allocated_stack",
> .insns = {
> BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
> + /* make later reg spill more interesting by having somewhat known scalar */
> + BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 0xff),
> BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
> BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 123, 0),
> BPF_STX_MEM(BPF_DW, BPF_REG_3, BPF_REG_0, -8),
> @@ -179,18 +182,21 @@
> },
> .prog_type = BPF_PROG_TYPE_XDP,
> .flags = BPF_F_TEST_STATE_FREQ,
> - .errstr = "mark_precise: frame0: last_idx 6 first_idx 6\
> + .errstr = "mark_precise: frame0: last_idx 7 first_idx 7\
> mark_precise: frame0: parent state regs=r4 stack=:\
> - mark_precise: frame0: last_idx 5 first_idx 3\
> - mark_precise: frame0: regs=r4 stack= before 5\
> - mark_precise: frame0: regs=r4 stack= before 4\
> - mark_precise: frame0: regs= stack=-8 before 3\
> - mark_precise: frame0: falling back to forcing all scalars precise\
> - force_precise: frame0: forcing r0 to be precise\
> - force_precise: frame0: forcing r0 to be precise\
> - force_precise: frame0: forcing r0 to be precise\
> - force_precise: frame0: forcing r0 to be precise\
> - mark_precise: frame0: last_idx 6 first_idx 6\
> + mark_precise: frame0: last_idx 6 first_idx 4\
> + mark_precise: frame0: regs=r4 stack= before 6: (b7) r0 = -1\
> + mark_precise: frame0: regs=r4 stack= before 5: (79) r4 = *(u64 *)(r10 -8)\
> + mark_precise: frame0: regs= stack=-8 before 4: (7b) *(u64 *)(r3 -8) = r0\
> + mark_precise: frame0: parent state regs=r0 stack=:\
> + mark_precise: frame0: last_idx 3 first_idx 3\
> + mark_precise: frame0: regs=r0 stack= before 3: (55) if r3 != 0x7b goto pc+0\
> + mark_precise: frame0: regs=r0 stack= before 2: (bf) r3 = r10\
> + mark_precise: frame0: regs=r0 stack= before 1: (57) r0 &= 255\
> + mark_precise: frame0: parent state regs=r0 stack=:\
> + mark_precise: frame0: last_idx 0 first_idx 0\
> + mark_precise: frame0: regs=r0 stack= before 0: (85) call bpf_get_prandom_u32#7\
> + mark_precise: frame0: last_idx 7 first_idx 7\
> mark_precise: frame0: parent state regs= stack=:",
> .result = VERBOSE_ACCEPT,
> .retval = -1,
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
2023-10-31 5:03 ` [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return Andrii Nakryiko
@ 2023-11-09 15:20 ` Eduard Zingerman
2023-11-09 17:32 ` Andrii Nakryiko
0 siblings, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 15:20 UTC (permalink / raw)
To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team
On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > Given verifier checks actual value, r0 has to be precise, so we need to
> > propagate precision properly.
> >
> > Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
I don't follow why this is necessary, could you please conjure
an example showing that current behavior is not safe?
This example could be used as a test case, as this change
seems to not be covered by test cases.
> > ---
> > kernel/bpf/verifier.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index fbb779583d52..098ba0e1a6ff 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -9739,6 +9739,12 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
> > verbose(env, "R0 not a scalar value\n");
> > return -EACCES;
> > }
> > +
> > + /* we are going to enforce precise value, mark r0 precise */
> > + err = mark_chain_precision(env, BPF_REG_0);
> > + if (err)
> > + return err;
> > +
> > if (!tnum_in(range, r0->var_off)) {
> > verbose_invalid_scalar(env, r0, &range, "callback return", "R0");
> > return -EINVAL;
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 4/7] bpf: fix check for attempt to corrupt spilled pointer
2023-10-31 5:03 ` [PATCH bpf-next 4/7] bpf: fix check for attempt to corrupt spilled pointer Andrii Nakryiko
@ 2023-11-09 15:20 ` Eduard Zingerman
0 siblings, 0 replies; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 15:20 UTC (permalink / raw)
To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team
On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > When register is spilled onto a stack as a 1/2/4-byte register, we set
> > slot_type[BPF_REG_SIZE - 1] (plus potentially few more below it,
> > depending on actual spill size). So to check if some stack slot has
> > spilled register we need to consult slot_type[7], not slot_type[0].
> >
> > To avoid the need to remember and double-check this in the future, just
> > use is_spilled_reg() helper.
> >
> > Fixes: 638f5b90d460 ("bpf: reduce verifier memory consumption")
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> > ---
> > kernel/bpf/verifier.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 098ba0e1a6ff..82992c32c1bd 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -4622,7 +4622,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> > * so it's aligned access and [off, off + size) are within stack limits
> > */
> > if (!env->allow_ptr_leaks &&
> > - state->stack[spi].slot_type[0] == STACK_SPILL &&
> > + is_spilled_reg(&state->stack[spi]) &&
> > size != BPF_REG_SIZE) {
> > verbose(env, "attempt to corrupt spilled pointer on stack\n");
> > return -EACCES;
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills
2023-10-31 5:03 ` [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills Andrii Nakryiko
@ 2023-11-09 15:20 ` Eduard Zingerman
2023-11-09 17:37 ` Andrii Nakryiko
0 siblings, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 15:20 UTC (permalink / raw)
To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team
On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> Instead of always forcing STACK_ZERO slots to STACK_MISC, preserve it in
> situations where this is possible. E.g., when spilling register as
> 1/2/4-byte subslots on the stack, all the remaining bytes in the stack
> slot do not automatically become unknown. If we knew they contained
> zeroes, we can preserve those STACK_ZERO markers.
>
> Add a helper mark_stack_slot_misc(), similar to scrub_spilled_slot(),
> but that doesn't overwrite either STACK_INVALID nor STACK_ZERO. Note
> that we need to take into account possibility of being in unprivileged
> mode, in which case STACK_INVALID is forced to STACK_MISC for correctness,
> as treating STACK_INVALID as equivalent STACK_MISC is only enabled in
> privileged mode.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Could you please add a test case?
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
[...]
> @@ -1355,6 +1355,21 @@ static void scrub_spilled_slot(u8 *stype)
> *stype = STACK_MISC;
> }
>
> +/* Mark stack slot as STACK_MISC, unless it is already STACK_INVALID, in which
> + * case they are equivalent, or it's STACK_ZERO, in which case we preserve
> + * more precise STACK_ZERO.
> + * Note, in uprivileged mode leaving STACK_INVALID is wrong, so we take
> + * env->allow_ptr_leaks into account and force STACK_MISC, if necessary.
> + */
> +static void mark_stack_slot_misc(struct bpf_verifier_env *env, u8 *stype)
Nitpick: I find this name misleading, maybe something like "remove_spill_mark"?
[...]
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore
2023-10-31 5:03 ` [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore Andrii Nakryiko
@ 2023-11-09 15:20 ` Eduard Zingerman
2023-11-09 17:41 ` Andrii Nakryiko
0 siblings, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 15:20 UTC (permalink / raw)
To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team
On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> Similar to special handling of STACK_ZERO, when reading 1/2/4 bytes from
> stack from slot that has register spilled into it and that register has
> a constant value zero, preserve that zero and mark spilled register as
> precise for that. This makes spilled const zero register and STACK_ZERO
> cases equivalent in their behavior.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Could you please add a test case?
[...]
> ---
> kernel/bpf/verifier.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 0eecc6b3109c..8cfe060e4938 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4958,22 +4958,39 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
> copy_register_state(&state->regs[dst_regno], reg);
> state->regs[dst_regno].subreg_def = subreg_def;
> } else {
[...]
> +
> + if (spill_cnt == size &&
> + tnum_is_const(reg->var_off) && reg->var_off.value == 0) {
> + __mark_reg_const_zero(&state->regs[dst_regno]);
> + /* this IS register fill, so keep insn_flags */
> + } else if (zero_cnt == size) {
> + /* similarly to mark_reg_stack_read(), preserve zeroes */
> + __mark_reg_const_zero(&state->regs[dst_regno]);
> + insn_flags = 0; /* not restoring original register state */
> + } else {
> + mark_reg_unknown(env, state->regs, dst_regno);
> + insn_flags = 0; /* not restoring original register state */
> + }
Condition for this branch is (off % BPF_REG_SIZE != 0) || size != spill_size,
is it necessary to check for some unusual offsets, e.g. off % BPF_REG_SIZE == 7
or something like that?
[...]
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
2023-10-31 5:03 ` [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Andrii Nakryiko
2023-10-31 5:22 ` Andrii Nakryiko
@ 2023-11-09 15:21 ` Eduard Zingerman
2023-11-09 17:43 ` Andrii Nakryiko
1 sibling, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 15:21 UTC (permalink / raw)
To: Andrii Nakryiko, bpf, ast, daniel, martin.lau; +Cc: kernel-team
On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> track aligned STACK_ZERO cases as imprecise spilled registers
Great improvement.
Could you please add a test case?
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
2023-11-09 15:20 ` Eduard Zingerman
@ 2023-11-09 16:13 ` Alexei Starovoitov
2023-11-09 17:28 ` Andrii Nakryiko
0 siblings, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2023-11-09 16:13 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Kernel Team
On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > Instead of allocating and copying jump history each time we enqueue
> > child verifier state, switch to a model where we use one common
> > dynamically sized array of instruction jumps across all states.
> >
> > The key observation for proving this is correct is that jmp_history is
> > only relevant while state is active, which means it either is a current
> > state (and thus we are actively modifying jump history and no other
> > state can interfere with us) or we are checkpointed state with some
> > children still active (either enqueued or being current).
> >
> > In the latter case our portion of jump history is finalized and won't
> > change or grow, so as long as we keep it immutable until the state is
> > finalized, we are good.
> >
> > Now, when state is finalized and is put into state hash for potentially
> > future pruning lookups, jump history is not used anymore. This is
> > because jump history is only used by precision marking logic, and we
> > never modify precision markings for finalized states.
> >
> > So, instead of each state having its own small jump history, we keep
> > a global dynamically-sized jump history, where each state in current DFS
> > path from root to active state remembers its portion of jump history.
> > Current state can append to this history, but cannot modify any of its
> > parent histories.
> >
> > Because the jmp_history array can be grown through realloc, states don't
> > keep pointers, they instead maintain two indexes [start, end) into
> > global jump history array. End is exclusive index, so start == end means
> > there is no relevant jump history.
> >
> > This should eliminate a lot of allocations and minimize overall memory
> > usage (but I haven't benchmarked on real hardware, and QEMU benchmarking
> > is too noisy).
> >
> > Also, in the next patch we'll extend jump history to maintain additional
> > markings for some instructions even if there was no jump, so in
> > preparation for that call this thing a more generic "instruction history".
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Nitpick: could you please add a comment somewhere in the code
> (is_state_visited? pop_stack?) saying something like this:
>
> states in the env->head happen to be sorted by insn_hist_end in
> descending order, so popping next state for verification poses no
> risk of overwriting history relevant for states remaining in
> env->head.
>
> Side note: this change would make it harder to change states traversal
> order to something other than DFS, should we chose to do so.
I have the same concern.
When we discussed different algorithms to solve open-coded-iters/bpf_loop
issue non-DFS ideas came up multiple times.
To be fair I didn't like them, because I wanted to preserve DFS property :)
but I feel sooner or later we will be forced to explore non-DFS.
So I think this patch is no go. There is really no need to rely on DFS here.
Let instruction history consume more memory. It's a better long term trade off.
We don't do strict DFS today.
The speculative execution analysis is DFS, but it visits paths
multiple times, so it's not a canonical DFS.
It probably doesn't break this particular insn_hist approach,
but still feels too fragile to rely on DFS assumption long term.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking
2023-11-09 15:20 ` Eduard Zingerman
@ 2023-11-09 17:20 ` Andrii Nakryiko
2023-11-09 18:20 ` Eduard Zingerman
0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:20 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team,
Tao Lyu
On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
>
> All makes sense, a few nitpicks below.
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
> > +/* instruction history flags, used in bpf_insn_hist_entry.flags field */
> > +enum {
> > + /* instruction references stack slot through PTR_TO_STACK register;
> > + * we also store stack's frame number in lower 3 bits (MAX_CALL_FRAMES is 8)
> > + * and accessed stack slot's index in next 6 bits (MAX_BPF_STACK is 512,
> > + * 8 bytes per slot, so slot index (spi) is [0, 63])
> > + */
> > + INSN_F_FRAMENO_MASK = 0x7, /* 3 bits */
> > +
> > + INSN_F_SPI_MASK = 0x3f, /* 6 bits */
> > + INSN_F_SPI_SHIFT = 3, /* shifted 3 bits to the left */
> > +
> > + INSN_F_STACK_ACCESS = BIT(9), /* we need 10 bits total */
> > +};
> > +
> > +static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES);
> > +static_assert(INSN_F_SPI_MASK + 1 >= MAX_BPF_STACK / 8);
> > +
> > struct bpf_insn_hist_entry {
> > - u32 prev_idx;
> > u32 idx;
> > + /* insn idx can't be bigger than 1 million */
> > + u32 prev_idx : 22;
> > + /* special flags, e.g., whether insn is doing register stack spill/load */
> > + u32 flags : 10;
> > };
>
> Nitpick: maybe use separate bit-fields for frameno and spi instead of
> flags? Or add dedicated accessor functions?
I wanted to keep it very uniform so that push_insn_history() doesn't
know about all such details. It just has "flags". We might use these
flags for some other use cases, though if we run out of bits we'll
probably just expand bpf_insn_hist_entry and refactor existing code
anyways. So, basically, I didn't want to over-engineer this bit too
much :)
>
> >
> > -#define MAX_CALL_FRAMES 8
> > /* Maximum number of register states that can exist at once */
> > #define BPF_ID_MAP_SIZE ((MAX_BPF_REG + MAX_BPF_STACK / BPF_REG_SIZE) * MAX_CALL_FRAMES)
> > struct bpf_verifier_state {
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 2905ce2e8b34..fbb779583d52 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -3479,14 +3479,20 @@ static bool is_jmp_point(struct bpf_verifier_env *env, int insn_idx)
> > }
> >
> > /* for any branch, call, exit record the history of jmps in the given state */
> > -static int push_jmp_history(struct bpf_verifier_env *env,
> > - struct bpf_verifier_state *cur)
> > +static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
> > + int insn_flags)
> > {
> > struct bpf_insn_hist_entry *p;
> > size_t alloc_size;
> >
> > - if (!is_jmp_point(env, env->insn_idx))
> > + /* combine instruction flags if we already recorded this instruction */
> > + if (cur->insn_hist_end > cur->insn_hist_start &&
> > + (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
> > + p->idx == env->insn_idx &&
> > + p->prev_idx == env->prev_insn_idx) {
> > + p->flags |= insn_flags;
>
> Nitpick: maybe add an assert to check that frameno/spi are not or'ed?
ok, something like
WARN_ON_ONCE(p->flags & (INSN_F_STACK_ACCESS | INSN_F_FRAMENOMASK |
(INSN_F_SPI_MASK << INSN_F_SPI_SHIFT)));
?
>
> [...]
>
> > +static struct bpf_insn_hist_entry *get_hist_insn_entry(struct bpf_verifier_env *env,
> > + u32 hist_start, u32 hist_end, int insn_idx)
>
> Nitpick: maybe rename 'hist_insn' to 'insn_hist', i.e. 'get_insn_hist_entry'?
sure, good point, done
>
> [...]
>
> > @@ -4713,9 +4711,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> >
> > /* Mark slots affected by this stack write. */
> > for (i = 0; i < size; i++)
> > - state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] =
> > - type;
> > + state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = type;
> > + insn_flags = 0; /* not a register spill */
> > }
> > +
> > + if (insn_flags)
> > + return push_insn_history(env, env->cur_state, insn_flags);
>
> Maybe add a check that insn is BPF_ST or BPF_STX here?
> Only these cases are supported by backtrack_insn() while
> check_mem_access() is called from multiple places.
seems like a wrong place to enforce that check_stack_write_fixed_off()
is called only for those instructions?
>
> > return 0;
> > }
> >
> > @@ -4908,6 +4909,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
> > int i, slot = -off - 1, spi = slot / BPF_REG_SIZE;
> > struct bpf_reg_state *reg;
> > u8 *stype, type;
> > + int insn_flags = INSN_F_STACK_ACCESS | (spi << INSN_F_SPI_SHIFT) | reg_state->frameno;
> >
> > stype = reg_state->stack[spi].slot_type;
> > reg = ®_state->stack[spi].spilled_ptr;
[...]
trimming is good
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
2023-11-09 16:13 ` Alexei Starovoitov
@ 2023-11-09 17:28 ` Andrii Nakryiko
2023-11-09 19:29 ` Alexei Starovoitov
0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:28 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Kernel Team
On Thu, Nov 9, 2023 at 8:14 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > > Instead of allocating and copying jump history each time we enqueue
> > > child verifier state, switch to a model where we use one common
> > > dynamically sized array of instruction jumps across all states.
> > >
> > > The key observation for proving this is correct is that jmp_history is
> > > only relevant while state is active, which means it either is a current
> > > state (and thus we are actively modifying jump history and no other
> > > state can interfere with us) or we are checkpointed state with some
> > > children still active (either enqueued or being current).
> > >
> > > In the latter case our portion of jump history is finalized and won't
> > > change or grow, so as long as we keep it immutable until the state is
> > > finalized, we are good.
> > >
> > > Now, when state is finalized and is put into state hash for potentially
> > > future pruning lookups, jump history is not used anymore. This is
> > > because jump history is only used by precision marking logic, and we
> > > never modify precision markings for finalized states.
> > >
> > > So, instead of each state having its own small jump history, we keep
> > > a global dynamically-sized jump history, where each state in current DFS
> > > path from root to active state remembers its portion of jump history.
> > > Current state can append to this history, but cannot modify any of its
> > > parent histories.
> > >
> > > Because the jmp_history array can be grown through realloc, states don't
> > > keep pointers, they instead maintain two indexes [start, end) into
> > > global jump history array. End is exclusive index, so start == end means
> > > there is no relevant jump history.
> > >
> > > This should eliminate a lot of allocations and minimize overall memory
> > > usage (but I haven't benchmarked on real hardware, and QEMU benchmarking
> > > is too noisy).
> > >
> > > Also, in the next patch we'll extend jump history to maintain additional
> > > markings for some instructions even if there was no jump, so in
> > > preparation for that call this thing a more generic "instruction history".
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> >
> > Nitpick: could you please add a comment somewhere in the code
> > (is_state_visited? pop_stack?) saying something like this:
> >
> > states in the env->head happen to be sorted by insn_hist_end in
> > descending order, so popping next state for verification poses no
> > risk of overwriting history relevant for states remaining in
> > env->head.
> >
> > Side note: this change would make it harder to change states traversal
> > order to something other than DFS, should we chose to do so.
>
> I have the same concern.
>
> When we discussed different algorithms to solve open-coded-iters/bpf_loop
> issue non-DFS ideas came up multiple times.
> To be fair I didn't like them, because I wanted to preserve DFS property :)
> but I feel sooner or later we will be forced to explore non-DFS.
> So I think this patch is no go. There is really no need to rely on DFS here.
If we ever break DFS property, we can easily change this. Or we can
even have a hybrid: as long as traversal preserves DFS property, we
use global shared history, but we can also optionally clone and have
our own history if necessary. It's a matter of adding optional
potentially NULL pointer to "local history". All this is very nicely
hidden away from "normal" code.
> Let instruction history consume more memory. It's a better long term trade off.
Before we decide this, let me collect stats on how much memory we use
for jmp_history with and without my change. I'll need to add a bit of
temporary code to veristat and verifier to collect this, but it
shouldn't take much effort. OK?
> We don't do strict DFS today.
> The speculative execution analysis is DFS, but it visits paths
> multiple times, so it's not a canonical DFS.
Not sure I follow. It's still a DFS, we just branch out more.
But again, let's look at data first. I'll get back with numbers soon.
> It probably doesn't break this particular insn_hist approach,
> but still feels too fragile to rely on DFS assumption long term.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
2023-11-09 15:20 ` Eduard Zingerman
@ 2023-11-09 17:32 ` Andrii Nakryiko
2023-11-09 17:38 ` Eduard Zingerman
0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:32 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team
On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > > Given verifier checks actual value, r0 has to be precise, so we need to
> > > propagate precision properly.
> > >
> > > Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> I don't follow why this is necessary, could you please conjure
> an example showing that current behavior is not safe?
> This example could be used as a test case, as this change
> seems to not be covered by test cases.
We rely on callbacks to return specific value (0 or 1, for example),
and use or might use that in kernel code. So if we rely on the
specific value of a register, it has to be precise. Marking r0 as
precise will have implications on other registers from which r0 was
derived. This might have implications on state pruning and stuff. If
r0 and its ancestors are not precise, we might erroneously assume some
states are safe and prune them, even though they are not.
I'll see if I can come up with a simple and quick test. I can always
drop this change, it was a bit of a drive-by bug I noticed while
looking for other issues.
>
> > > ---
> > > kernel/bpf/verifier.c | 6 ++++++
> > > 1 file changed, 6 insertions(+)
> > >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index fbb779583d52..098ba0e1a6ff 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -9739,6 +9739,12 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
> > > verbose(env, "R0 not a scalar value\n");
> > > return -EACCES;
> > > }
> > > +
> > > + /* we are going to enforce precise value, mark r0 precise */
> > > + err = mark_chain_precision(env, BPF_REG_0);
> > > + if (err)
> > > + return err;
> > > +
> > > if (!tnum_in(range, r0->var_off)) {
> > > verbose_invalid_scalar(env, r0, &range, "callback return", "R0");
> > > return -EINVAL;
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills
2023-11-09 15:20 ` Eduard Zingerman
@ 2023-11-09 17:37 ` Andrii Nakryiko
2023-11-09 17:54 ` Eduard Zingerman
0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:37 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team
On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > Instead of always forcing STACK_ZERO slots to STACK_MISC, preserve it in
> > situations where this is possible. E.g., when spilling register as
> > 1/2/4-byte subslots on the stack, all the remaining bytes in the stack
> > slot do not automatically become unknown. If we knew they contained
> > zeroes, we can preserve those STACK_ZERO markers.
> >
> > Add a helper mark_stack_slot_misc(), similar to scrub_spilled_slot(),
> > but that doesn't overwrite either STACK_INVALID nor STACK_ZERO. Note
> > that we need to take into account possibility of being in unprivileged
> > mode, in which case STACK_INVALID is forced to STACK_MISC for correctness,
> > as treating STACK_INVALID as equivalent STACK_MISC is only enabled in
> > privileged mode.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Could you please add a test case?
>
sure
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
> > @@ -1355,6 +1355,21 @@ static void scrub_spilled_slot(u8 *stype)
> > *stype = STACK_MISC;
> > }
> >
> > +/* Mark stack slot as STACK_MISC, unless it is already STACK_INVALID, in which
> > + * case they are equivalent, or it's STACK_ZERO, in which case we preserve
> > + * more precise STACK_ZERO.
> > + * Note, in uprivileged mode leaving STACK_INVALID is wrong, so we take
> > + * env->allow_ptr_leaks into account and force STACK_MISC, if necessary.
> > + */
> > +static void mark_stack_slot_misc(struct bpf_verifier_env *env, u8 *stype)
>
> Nitpick: I find this name misleading, maybe something like "remove_spill_mark"?
remove_spill_mark is even more misleading, no? there is also DYNPTR
and ITER stack slots?
maybe mark_stack_slot_scalar (though that's a bit misleading as well,
as can be understood as marking slot as spilled SCALAR_VALUE
register)? not sure, I think "slot_misc" is close enough as an
approximation of what it's doing, modulo ZERO/INVALID
>
> [...]
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
2023-11-09 17:32 ` Andrii Nakryiko
@ 2023-11-09 17:38 ` Eduard Zingerman
2023-11-09 17:50 ` Andrii Nakryiko
0 siblings, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 17:38 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team
On Thu, 2023-11-09 at 09:32 -0800, Andrii Nakryiko wrote:
> On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > > > Given verifier checks actual value, r0 has to be precise, so we need to
> > > > propagate precision properly.
> > > >
> > > > Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
> > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> >
> > I don't follow why this is necessary, could you please conjure
> > an example showing that current behavior is not safe?
> > This example could be used as a test case, as this change
> > seems to not be covered by test cases.
>
> We rely on callbacks to return specific value (0 or 1, for example),
> and use or might use that in kernel code. So if we rely on the
> specific value of a register, it has to be precise. Marking r0 as
> precise will have implications on other registers from which r0 was
> derived. This might have implications on state pruning and stuff. If
> r0 and its ancestors are not precise, we might erroneously assume some
> states are safe and prune them, even though they are not.
The r0 returned from bpf_loop's callback says bpf_loop to stop iteration,
bpf_loop returns the number of completed iterations. However, the return
value of bpf_loop modeled by verifier is unbounded scalar.
Same for map's for each.
I'm not sure we have callback calling functions that can expose this as a
safety issue.
[...]
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore
2023-11-09 15:20 ` Eduard Zingerman
@ 2023-11-09 17:41 ` Andrii Nakryiko
2023-11-09 19:34 ` Eduard Zingerman
0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:41 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team
On Thu, Nov 9, 2023 at 7:21 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > Similar to special handling of STACK_ZERO, when reading 1/2/4 bytes from
> > stack from slot that has register spilled into it and that register has
> > a constant value zero, preserve that zero and mark spilled register as
> > precise for that. This makes spilled const zero register and STACK_ZERO
> > cases equivalent in their behavior.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Could you please add a test case?
>
There is already at least one test case that relies on this behavior
:) But yep, I'll add a dedicated test.
> [...]
>
> > ---
> > kernel/bpf/verifier.c | 25 +++++++++++++++++++++----
> > 1 file changed, 21 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 0eecc6b3109c..8cfe060e4938 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -4958,22 +4958,39 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
> > copy_register_state(&state->regs[dst_regno], reg);
> > state->regs[dst_regno].subreg_def = subreg_def;
> > } else {
> [...]
> > +
> > + if (spill_cnt == size &&
> > + tnum_is_const(reg->var_off) && reg->var_off.value == 0) {
> > + __mark_reg_const_zero(&state->regs[dst_regno]);
> > + /* this IS register fill, so keep insn_flags */
> > + } else if (zero_cnt == size) {
> > + /* similarly to mark_reg_stack_read(), preserve zeroes */
> > + __mark_reg_const_zero(&state->regs[dst_regno]);
> > + insn_flags = 0; /* not restoring original register state */
> > + } else {
> > + mark_reg_unknown(env, state->regs, dst_regno);
> > + insn_flags = 0; /* not restoring original register state */
> > + }
>
> Condition for this branch is (off % BPF_REG_SIZE != 0) || size != spill_size,
> is it necessary to check for some unusual offsets, e.g. off % BPF_REG_SIZE == 7
> or something like that?
I don't think so. We rely on all bytes we are reading to be either
spills (and thus spill_cnt == size), in which case verifier logic
makes sure we have spill at slot boundary (off % BPF_REG_SIZE == 0).
Or it's all STACK_ZERO, and then zero_cnt == size, in which case we
know it's zero.
Unless I missed something else?
>
> [...]
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
2023-11-09 15:21 ` Eduard Zingerman
@ 2023-11-09 17:43 ` Andrii Nakryiko
2023-11-09 17:44 ` Eduard Zingerman
0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:43 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team
On Thu, Nov 9, 2023 at 7:21 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > track aligned STACK_ZERO cases as imprecise spilled registers
>
> Great improvement.
thanks!
> Could you please add a test case?
sure, though I guess I'd have to rely on verifier state printing logic
for this, is that ok?
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers
2023-11-09 17:43 ` Andrii Nakryiko
@ 2023-11-09 17:44 ` Eduard Zingerman
0 siblings, 0 replies; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 17:44 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team
On Thu, 2023-11-09 at 09:43 -0800, Andrii Nakryiko wrote:
> sure, though I guess I'd have to rely on verifier state printing logic
> for this, is that ok?
Sure, thank you.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
2023-11-09 17:38 ` Eduard Zingerman
@ 2023-11-09 17:50 ` Andrii Nakryiko
2023-11-09 17:58 ` Alexei Starovoitov
2023-11-09 18:00 ` Eduard Zingerman
0 siblings, 2 replies; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 17:50 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team
On Thu, Nov 9, 2023 at 9:38 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2023-11-09 at 09:32 -0800, Andrii Nakryiko wrote:
> > On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > > > > Given verifier checks actual value, r0 has to be precise, so we need to
> > > > > propagate precision properly.
> > > > >
> > > > > Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
> > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > >
> > > I don't follow why this is necessary, could you please conjure
> > > an example showing that current behavior is not safe?
> > > This example could be used as a test case, as this change
> > > seems to not be covered by test cases.
> >
> > We rely on callbacks to return specific value (0 or 1, for example),
> > and use or might use that in kernel code. So if we rely on the
> > specific value of a register, it has to be precise. Marking r0 as
> > precise will have implications on other registers from which r0 was
> > derived. This might have implications on state pruning and stuff. If
> > r0 and its ancestors are not precise, we might erroneously assume some
> > states are safe and prune them, even though they are not.
>
> The r0 returned from bpf_loop's callback says bpf_loop to stop iteration,
> bpf_loop returns the number of completed iterations. However, the return
> value of bpf_loop modeled by verifier is unbounded scalar.
> Same for map's for each.
return value of bpf_loop() is a different thing from return value of
bpf_loop's callback. Right now bpf_loop implementation in kernel does
ret = callback(...);
/* return value: 0 - continue, 1 - stop and return */
if (ret)
return i + 1;
So yes, it doesn't rely explicitly on return value to be 1 just due to
the above implementation. But verifier is meant to enforce that and
the protocol is that bpf_loop and other callback calling helpers
should rely on this value.
I think we have the same problem in check_return_code() for entry BPF
programs. So let me taking this one out of this patch set and post a
new one concentrating on this particular issue. I've been meaning to
use umin/umax for return value checking anyways, so might be a good
idea to do this anyways.
>
> I'm not sure we have callback calling functions that can expose this as a
> safety issue.
Even if we can't exploit it today, it's breaking the protocol and
guarantees that verifier provides, so I think this needs to be fixed.
>
> [...]
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills
2023-11-09 17:37 ` Andrii Nakryiko
@ 2023-11-09 17:54 ` Eduard Zingerman
0 siblings, 0 replies; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 17:54 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team
On Thu, 2023-11-09 at 09:37 -0800, Andrii Nakryiko wrote:
[...]
> > > @@ -1355,6 +1355,21 @@ static void scrub_spilled_slot(u8 *stype)
> > > *stype = STACK_MISC;
> > > }
> > >
> > > +/* Mark stack slot as STACK_MISC, unless it is already STACK_INVALID, in which
> > > + * case they are equivalent, or it's STACK_ZERO, in which case we preserve
> > > + * more precise STACK_ZERO.
> > > + * Note, in uprivileged mode leaving STACK_INVALID is wrong, so we take
> > > + * env->allow_ptr_leaks into account and force STACK_MISC, if necessary.
> > > + */
> > > +static void mark_stack_slot_misc(struct bpf_verifier_env *env, u8 *stype)
> >
> > Nitpick: I find this name misleading, maybe something like "remove_spill_mark"?
>
> remove_spill_mark is even more misleading, no? there is also DYNPTR
> and ITER stack slots?
Right, forgot about those...
>
> maybe mark_stack_slot_scalar (though that's a bit misleading as well,
> as can be understood as marking slot as spilled SCALAR_VALUE
> register)? not sure, I think "slot_misc" is close enough as an
> approximation of what it's doing, modulo ZERO/INVALID
maybe_mark_stack_slot_misc?
The other similar function is named 'scrub_spilled_slot'.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
2023-11-09 17:50 ` Andrii Nakryiko
@ 2023-11-09 17:58 ` Alexei Starovoitov
2023-11-09 18:01 ` Andrii Nakryiko
2023-11-09 18:00 ` Eduard Zingerman
1 sibling, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2023-11-09 17:58 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Kernel Team
On Thu, Nov 9, 2023 at 9:50 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> >
> > The r0 returned from bpf_loop's callback says bpf_loop to stop iteration,
> > bpf_loop returns the number of completed iterations. However, the return
> > value of bpf_loop modeled by verifier is unbounded scalar.
> > Same for map's for each.
>
> return value of bpf_loop() is a different thing from return value of
> bpf_loop's callback. Right now bpf_loop implementation in kernel does
>
> ret = callback(...);
> /* return value: 0 - continue, 1 - stop and return */
> if (ret)
> return i + 1;
>
> So yes, it doesn't rely explicitly on return value to be 1 just due to
> the above implementation. But verifier is meant to enforce that and
> the protocol is that bpf_loop and other callback calling helpers
> should rely on this value.
>
> I think we have the same problem in check_return_code() for entry BPF
> programs. So let me taking this one out of this patch set and post a
> new one concentrating on this particular issue. I've been meaning to
> use umin/umax for return value checking anyways, so might be a good
> idea to do this anyways.
Just like Ed I was also initially confused by this.
As you said check_return_code() has the same problem.
I think the issue this patch and similar in check_return_code()
should be fixing is the case where one state went through
ret code checking, but another state with potentially out-of-range
r0 got state pruned since r0 wasn't marked precise.
Not sure how hard it would be to come up with a selftest for such a scenario.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
2023-11-09 17:50 ` Andrii Nakryiko
2023-11-09 17:58 ` Alexei Starovoitov
@ 2023-11-09 18:00 ` Eduard Zingerman
1 sibling, 0 replies; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 18:00 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team
On Thu, 2023-11-09 at 09:50 -0800, Andrii Nakryiko wrote:
> On Thu, Nov 9, 2023 at 9:38 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Thu, 2023-11-09 at 09:32 -0800, Andrii Nakryiko wrote:
> > > On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > >
> > > > On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > > > > > Given verifier checks actual value, r0 has to be precise, so we need to
> > > > > > propagate precision properly.
> > > > > >
> > > > > > Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
> > > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > >
> > > > I don't follow why this is necessary, could you please conjure
> > > > an example showing that current behavior is not safe?
> > > > This example could be used as a test case, as this change
> > > > seems to not be covered by test cases.
> > >
> > > We rely on callbacks to return specific value (0 or 1, for example),
> > > and use or might use that in kernel code. So if we rely on the
> > > specific value of a register, it has to be precise. Marking r0 as
> > > precise will have implications on other registers from which r0 was
> > > derived. This might have implications on state pruning and stuff. If
> > > r0 and its ancestors are not precise, we might erroneously assume some
> > > states are safe and prune them, even though they are not.
> >
> > The r0 returned from bpf_loop's callback says bpf_loop to stop iteration,
> > bpf_loop returns the number of completed iterations. However, the return
> > value of bpf_loop modeled by verifier is unbounded scalar.
> > Same for map's for each.
>
> return value of bpf_loop() is a different thing from return value of
> bpf_loop's callback. Right now bpf_loop implementation in kernel does
>
> ret = callback(...);
> /* return value: 0 - continue, 1 - stop and return */
> if (ret)
> return i + 1;
>
> So yes, it doesn't rely explicitly on return value to be 1 just due to
> the above implementation. But verifier is meant to enforce that and
> the protocol is that bpf_loop and other callback calling helpers
> should rely on this value.
>
> I think we have the same problem in check_return_code() for entry BPF
> programs. So let me taking this one out of this patch set and post a
> new one concentrating on this particular issue. I've been meaning to
> use umin/umax for return value checking anyways, so might be a good
> idea to do this anyways.
The precision mark is necessary if verifier makes some decisions
basing on the value. E.g. whether certain code path would be take or
whether specific value would be used as a pointer offset.
Neither is true for existing callbacks, value returned by callback
does not affect any verifier decisions.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
2023-11-09 17:58 ` Alexei Starovoitov
@ 2023-11-09 18:01 ` Andrii Nakryiko
2023-11-09 18:03 ` Eduard Zingerman
0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 18:01 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Kernel Team
On Thu, Nov 9, 2023 at 9:58 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 9:50 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > >
> > > The r0 returned from bpf_loop's callback says bpf_loop to stop iteration,
> > > bpf_loop returns the number of completed iterations. However, the return
> > > value of bpf_loop modeled by verifier is unbounded scalar.
> > > Same for map's for each.
> >
> > return value of bpf_loop() is a different thing from return value of
> > bpf_loop's callback. Right now bpf_loop implementation in kernel does
> >
> > ret = callback(...);
> > /* return value: 0 - continue, 1 - stop and return */
> > if (ret)
> > return i + 1;
> >
> > So yes, it doesn't rely explicitly on return value to be 1 just due to
> > the above implementation. But verifier is meant to enforce that and
> > the protocol is that bpf_loop and other callback calling helpers
> > should rely on this value.
> >
> > I think we have the same problem in check_return_code() for entry BPF
> > programs. So let me taking this one out of this patch set and post a
> > new one concentrating on this particular issue. I've been meaning to
> > use umin/umax for return value checking anyways, so might be a good
> > idea to do this anyways.
>
> Just like Ed I was also initially confused by this.
> As you said check_return_code() has the same problem.
> I think the issue this patch and similar in check_return_code()
> should be fixing is the case where one state went through
> ret code checking, but another state with potentially out-of-range
> r0 got state pruned since r0 wasn't marked precise.
Right.
> Not sure how hard it would be to come up with a selftest for such a scenario.
Yep, I'll think of something. Lots of tests to come up with :)
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return
2023-11-09 18:01 ` Andrii Nakryiko
@ 2023-11-09 18:03 ` Eduard Zingerman
0 siblings, 0 replies; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 18:03 UTC (permalink / raw)
To: Andrii Nakryiko, Alexei Starovoitov
Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Kernel Team
On Thu, 2023-11-09 at 10:01 -0800, Andrii Nakryiko wrote:
[...]
> > Just like Ed I was also initially confused by this.
> > As you said check_return_code() has the same problem.
> > I think the issue this patch and similar in check_return_code()
> > should be fixing is the case where one state went through
> > ret code checking, but another state with potentially out-of-range
> > r0 got state pruned since r0 wasn't marked precise.
>
> Right.
>
> > Not sure how hard it would be to come up with a selftest for such a scenario.
>
> Yep, I'll think of something. Lots of tests to come up with :)
Hm, range argument is convincing, thank you.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking
2023-11-09 17:20 ` Andrii Nakryiko
@ 2023-11-09 18:20 ` Eduard Zingerman
2023-11-10 5:48 ` Andrii Nakryiko
0 siblings, 1 reply; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 18:20 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team,
Tao Lyu
On Thu, 2023-11-09 at 09:20 -0800, Andrii Nakryiko wrote:
[...]
> > > struct bpf_insn_hist_entry {
> > > - u32 prev_idx;
> > > u32 idx;
> > > + /* insn idx can't be bigger than 1 million */
> > > + u32 prev_idx : 22;
> > > + /* special flags, e.g., whether insn is doing register stack spill/load */
> > > + u32 flags : 10;
> > > };
> >
> > Nitpick: maybe use separate bit-fields for frameno and spi instead of
> > flags? Or add dedicated accessor functions?
>
> I wanted to keep it very uniform so that push_insn_history() doesn't
> know about all such details. It just has "flags". We might use these
> flags for some other use cases, though if we run out of bits we'll
> probably just expand bpf_insn_hist_entry and refactor existing code
> anyways. So, basically, I didn't want to over-engineer this bit too
> much :)
Well, maybe hide "(hist->flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK"
behind an accessor?
[...]
> > > +static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
> > > + int insn_flags)
> > > {
> > > struct bpf_insn_hist_entry *p;
> > > size_t alloc_size;
> > >
> > > - if (!is_jmp_point(env, env->insn_idx))
> > > + /* combine instruction flags if we already recorded this instruction */
> > > + if (cur->insn_hist_end > cur->insn_hist_start &&
> > > + (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
> > > + p->idx == env->insn_idx &&
> > > + p->prev_idx == env->prev_insn_idx) {
> > > + p->flags |= insn_flags;
> >
> > Nitpick: maybe add an assert to check that frameno/spi are not or'ed?
>
> ok, something like
>
> WARN_ON_ONCE(p->flags & (INSN_F_STACK_ACCESS | INSN_F_FRAMENOMASK |
> (INSN_F_SPI_MASK << INSN_F_SPI_SHIFT)));
>
> ?
Something like this, yes.
[...]
> > > @@ -4713,9 +4711,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> > >
> > > /* Mark slots affected by this stack write. */
> > > for (i = 0; i < size; i++)
> > > - state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] =
> > > - type;
> > > + state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = type;
> > > + insn_flags = 0; /* not a register spill */
> > > }
> > > +
> > > + if (insn_flags)
> > > + return push_insn_history(env, env->cur_state, insn_flags);
> >
> > Maybe add a check that insn is BPF_ST or BPF_STX here?
> > Only these cases are supported by backtrack_insn() while
> > check_mem_access() is called from multiple places.
>
> seems like a wrong place to enforce that check_stack_write_fixed_off()
> is called only for those instructions?
check_stack_write_fixed_off() is called from check_stack_write() which
is called from check_mem_access() which might trigger
check_stack_write_fixed_off() when called with BPF_WRITE flag and
pointer to stack as an argument.
This happens for ST, STX but also in check_helper_call(),
process_iter_arg() (maybe other places).
Speaking of which, should this be handled in backtrack_insn()?
> [...]
>
> trimming is good
Sigh... sorry, really tried to trim everything today.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
2023-11-09 17:28 ` Andrii Nakryiko
@ 2023-11-09 19:29 ` Alexei Starovoitov
2023-11-09 19:49 ` Andrii Nakryiko
0 siblings, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2023-11-09 19:29 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Kernel Team
On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
>
> If we ever break DFS property, we can easily change this. Or we can
> even have a hybrid: as long as traversal preserves DFS property, we
> use global shared history, but we can also optionally clone and have
> our own history if necessary. It's a matter of adding optional
> potentially NULL pointer to "local history". All this is very nicely
> hidden away from "normal" code.
If we can "easily change this" then let's make it last and optional patch.
So we can revert in the future when we need to take non-DFS path.
> But again, let's look at data first. I'll get back with numbers soon.
Sure. I think memory increase due to more tracking is ok.
I suspect it won't cause 2x increase. Likely few %.
The last time I checked the main memory hog is states stashed for pruning.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore
2023-11-09 17:41 ` Andrii Nakryiko
@ 2023-11-09 19:34 ` Eduard Zingerman
0 siblings, 0 replies; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-09 19:34 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team
On Thu, 2023-11-09 at 09:41 -0800, Andrii Nakryiko wrote:
> On Thu, Nov 9, 2023 at 7:21 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote:
> > > Similar to special handling of STACK_ZERO, when reading 1/2/4 bytes from
> > > stack from slot that has register spilled into it and that register has
> > > a constant value zero, preserve that zero and mark spilled register as
> > > precise for that. This makes spilled const zero register and STACK_ZERO
> > > cases equivalent in their behavior.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> >
> > Could you please add a test case?
> >
>
> There is already at least one test case that relies on this behavior
> :) But yep, I'll add a dedicated test.
Thank you. Having a dedicated test always helps with debugging, should
something go wrong.
[...]
> > Condition for this branch is (off % BPF_REG_SIZE != 0) || size != spill_size,
> > is it necessary to check for some unusual offsets, e.g. off % BPF_REG_SIZE == 7
> > or something like that?
>
> I don't think so. We rely on all bytes we are reading to be either
> spills (and thus spill_cnt == size), in which case verifier logic
> makes sure we have spill at slot boundary (off % BPF_REG_SIZE == 0).
> Or it's all STACK_ZERO, and then zero_cnt == size, in which case we
> know it's zero.
>
> Unless I missed something else?
False alarm, 'slot' is derived from 'off' and the loop checks
'type = stype[(slot - i) % BPF_REG_SIZE];', sorry for the noise.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
2023-11-09 19:29 ` Alexei Starovoitov
@ 2023-11-09 19:49 ` Andrii Nakryiko
2023-11-09 20:39 ` Andrii Nakryiko
0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 19:49 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Kernel Team
On Thu, Nov 9, 2023 at 11:29 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> >
> > If we ever break DFS property, we can easily change this. Or we can
> > even have a hybrid: as long as traversal preserves DFS property, we
> > use global shared history, but we can also optionally clone and have
> > our own history if necessary. It's a matter of adding optional
> > potentially NULL pointer to "local history". All this is very nicely
> > hidden away from "normal" code.
>
> If we can "easily change this" then let's make it last and optional patch.
> So we can revert in the future when we need to take non-DFS path.
Ok, sounds good. I'll reorder and put it last, you can decide whether
to apply it or not that way.
>
> > But again, let's look at data first. I'll get back with numbers soon.
>
> Sure. I think memory increase due to more tracking is ok.
> I suspect it won't cause 2x increase. Likely few %.
> The last time I checked the main memory hog is states stashed for pruning.
So I'm back with data. See verifier.c changes I did at the bottom,
just to double check I'm not missing something major. I count the
number of allocations (but that's an underestimate that doesn't take
into account realloc), total number of instruction history entries for
entire program verification, and then also peak "depth" of instruction
history. Note that entries should be multiplied by 8 to get the amount
of bytes (and that's not counting per-allocation overhead).
Here are top 20 results, sorted by number of allocs for Meta-internal,
Cilium, and selftests. BEFORE is without added STACK_ACCESS tracking
and STACK_ZERO optimization. AFTER is with all the patches of this
patch set applied.
It's a few megabytes of memory allocation, which in itself is probably
not a big deal. But it's just an amount of unnecessary memory
allocations which is basically at least 2x of the total number of
states that we can save. And instead have just a few reallocs to size
global jump history to an order of magnitudes smaller peak entries.
And if we ever decide to track more stuff similar to
INSNS_F_STACK_ACCESS, we won't have to worry about more allocations or
more memory usage, because the absolute worst case is our global
history will be up to 1 million entries tops. We can track some *code
path dependent* per-instruction information for *each simulated
instruction* easily without having to think twice about this. Which I
think is a nice liberating thought in itself justifying this change.
META BEFORE
===========
[vmuser@archvm bpf]$ sudo veristat -e
prog,insns,states,jmp_allocs,jmp_total,jmp_peak -R
~/insn-hist-before-results-fbcode.csv -s jmp_allocs -n 20
Program Insns States Jumphist
allocs Jumphist total entries Jumphist peak entries
---------------------------------------- ------ ------
--------------- ---------------------- ---------------------
syar_file_open 712974 51407
154982 546559 742
balancer_ingress 339626 26535
92061 106593 61
vip_filter 457002 33010
83432 201396 276
tw_twfw_egress 511127 16733
81485 382989 4379
tw_twfw_tc_eg 511113 16732
81484 382987 4379
tw_twfw_ingress 500095 16223
80974 381708 4379
tw_twfw_tc_in 500095 16223
80974 381708 4379
adns 384816 11145
41882 128399 52
cls_fg_dscp 217709 13908
28163 59291 117
edgewall 179715 12607
26886 51134 74
mount_audit 87915 1938
19198 104412 315
xdpdecap 62648 5577
17530 18315 38
xdpdecap 62648 5577
17530 18315 38
xdpdecap 62648 5577
17530 18315 38
xdpdecap 58507 4687
16527 18613 40
syar_lsm_file_open 167772 1836
12720 90107 2107
tcdecapstats 38691 3112
10991 12371 34
twfw_connect6 44399 1974
9864 36320 1797
twfw_sendmsg6 44399 1974
9864 36320 1797
on_pytorch_event 100370 2153
7102 25661 199
META AFTER
==========
[vmuser@archvm bpf]$ sudo veristat -e
prog,insns,states,jmp_allocs,jmp_total,jmp_peak -R
~/insn-hist-after-results-fbcode.csv -s jmp_allocs -n 20
Program Insns States Jumphist
allocs Jumphist total entries Jumphist peak entries
---------------------------------------- ------ ------
--------------- ---------------------- ---------------------
syar_file_open 707473 51263
154488 431397 464
balancer_ingress 334452 26438
91881 107822 114
vip_filter 457002 33010
83432 287548 374
adns 384816 11145
41882 86357 274
tw_twfw_egress 212071 8504
33886 161518 5184
tw_twfw_ingress 212069 8504
33886 161515 5184
tw_twfw_tc_eg 212064 8504
33886 161515 5184
tw_twfw_tc_in 212069 8504
33886 161515 5184
cls_fg_dscp 213184 13702
27722 118594 281
mount_audit 87915 1938
19198 103835 202
xdpdecap 62648 5577
17530 25723 96
xdpdecap 62648 5577
17530 25723 96
xdpdecap 62648 5577
17530 25723 96
xdpdecap 58507 4687
16527 18674 54
syar_lsm_file_open 151813 1667
11530 21841 215
tcdecapstats 38691 3112
10991 12373 35
twfw_connect6 44399 1974
9864 63516 4378
twfw_sendmsg6 44399 1974
9864 63516 4378
edgewall 55783 3999
8467 31874 187
on_pytorch_event 102649 2152
7140 33563 250
CILIUM BEFORE
=============
[vmuser@archvm bpf]$ sudo veristat -e
file,prog,insns,states,jmp_allocs,jmp_total,jmp_peak -R
~/insn-hist-before-results-cilium.csv -s jmp_allocs -n 20
File Program Insns States
Jumphist allocs Jumphist total entries Jumphist peak entries
------------------ -------------------------------- ----- ------
--------------- ---------------------- ---------------------
bpf_xdp.o tail_lb_ipv6 80441 3647
6976 12471 105
bpf_xdp.o tail_lb_ipv4 39492 2430
4581 8117 105
bpf_host.o tail_nodeport_nat_egress_ipv4 22460 1469
2926 5302 226
bpf_overlay.o tail_nodeport_nat_egress_ipv4 22718 1475
2926 5285 227
bpf_host.o tail_handle_nat_fwd_ipv4 21022 1289
2498 4924 236
bpf_lxc.o tail_handle_nat_fwd_ipv4 21022 1289
2498 4924 236
bpf_overlay.o tail_handle_nat_fwd_ipv4 20524 1271
2465 4844 221
bpf_xdp.o tail_rev_nodeport_lb6 16173 1010
1934 3137 65
bpf_host.o tail_handle_nat_fwd_ipv6 15433 905
1802 3662 224
bpf_lxc.o tail_handle_nat_fwd_ipv6 15433 905
1802 3662 224
bpf_xdp.o tail_handle_nat_fwd_ipv4 12917 875
1638 2986 245
bpf_xdp.o tail_nodeport_nat_egress_ipv4 13027 868
1628 2957 227
bpf_xdp.o tail_handle_nat_fwd_ipv6 13515 715
1391 2492 215
bpf_xdp.o tail_nodeport_nat_ingress_ipv4 7617 522
985 1736 63
bpf_host.o cil_to_netdev 6047 362
783 1334 48
bpf_xdp.o tail_rev_nodeport_lb4 6808 403
761 1319 77
bpf_xdp.o tail_nodeport_nat_ingress_ipv6 7575 383
722 1261 63
bpf_host.o tail_nodeport_nat_ingress_ipv4 5526 366
693 1196 40
bpf_lxc.o tail_nodeport_nat_ingress_ipv4 5526 366
693 1196 40
bpf_overlay.o tail_nodeport_nat_ingress_ipv4 5526 366
693 1196 40
CILIUM AFTER
============
[vmuser@archvm bpf]$ sudo veristat -e
file,prog,insns,states,jmp_allocs,jmp_total,jmp_peak -R
~/insn-hist-after-results-cilium.csv -s jmp_allocs -n 20
File Program Insns States
Jumphist allocs Jumphist total entries Jumphist peak entries
------------------ -------------------------------- ----- ------
--------------- ---------------------- ---------------------
bpf_xdp.o tail_lb_ipv6 78058 3523
6810 19903 192
bpf_xdp.o tail_lb_ipv4 36367 2251
4293 11975 205
bpf_host.o tail_nodeport_nat_egress_ipv4 19862 1293
2568 7021 305
bpf_overlay.o tail_nodeport_nat_egress_ipv4 19490 1275
2552 6990 303
bpf_xdp.o tail_rev_nodeport_lb6 15847 990
1909 4533 96
bpf_xdp.o tail_handle_nat_fwd_ipv4 12443 849
1608 4177 315
bpf_xdp.o tail_nodeport_nat_egress_ipv4 12096 809
1523 4140 290
bpf_xdp.o tail_handle_nat_fwd_ipv6 13264 702
1378 3535 271
bpf_host.o tail_handle_nat_fwd_ipv4 10479 670
1325 3891 315
bpf_lxc.o tail_handle_nat_fwd_ipv4 10479 670
1325 3891 315
bpf_host.o tail_handle_nat_fwd_ipv6 11375 643
1292 3523 288
bpf_lxc.o tail_handle_nat_fwd_ipv6 11375 643
1292 3523 288
bpf_overlay.o tail_handle_nat_fwd_ipv4 10114 638
1266 3738 274
bpf_xdp.o tail_nodeport_nat_ingress_ipv4 5900 413
773 1891 92
bpf_xdp.o tail_rev_nodeport_lb4 6739 396
750 1899 136
bpf_xdp.o tail_nodeport_nat_ingress_ipv6 7395 374
711 1732 97
bpf_host.o cil_to_netdev 4578 249
512 1223 97
bpf_host.o tail_handle_ipv6_from_host 4168 244
499 1338 91
bpf_host.o tail_handle_ipv4_from_host 3434 231
477 1170 97
bpf_host.o tail_nodeport_nat_ingress_ipv4 3534 243
474 1344 77
SELFTESTS BEFORE
================
[vmuser@archvm bpf]$ sudo veristat -e
file,prog,insns,states,jmp_allocs,jmp_total,jmp_peak -R
~/insn-hist-before-results-selftests.csv -s jmp_allocs -n 20
File Program
Insns States Jumphist allocs Jumphist total entries Jumphist
peak entries
-----------------------------------------
----------------------------- ------- ------ ---------------
---------------------- ---------------------
pyperf600_nounroll.bpf.linked3.o on_event
533132 34227 67332 201368
15100
pyperf600.bpf.linked3.o on_event
475837 22259 48488 125533
9675
verifier_loops1.bpf.linked3.o
loop_after_a_conditional_jump 1000001 25000 25000
499983 499999
strobemeta.bpf.linked3.o on_event
180697 4780 20185 115993
9208
pyperf180.bpf.linked3.o on_event
118245 8422 17797 36579
2881
test_verif_scale1.bpf.linked3.o balancer_ingress
546742 8636 16439 43048
270
test_verif_scale3.bpf.linked3.o balancer_ingress
837487 8636 16439 43048
270
xdp_synproxy_kern.bpf.linked3.o syncookie_xdp
85116 5162 15308 30910
65
xdp_synproxy_kern.bpf.linked3.o syncookie_tc
82848 5107 15239 30812
66
strobemeta_nounroll2.bpf.linked3.o on_event
104119 3820 12128 72765
3388
test_cls_redirect.bpf.linked3.o cls_redirect
65594 4230 11683 18353
50
pyperf100.bpf.linked3.o on_event
72685 5123 11208 23467
1630
test_cls_redirect_subprogs.bpf.linked3.o cls_redirect
57790 4063 9711 17719
93
loop3.bpf.linked3.o while_true
1000001 9663 9663 111106
111111
test_verif_scale2.bpf.linked3.o balancer_ingress
767498 3048 9144 21812
90
strobemeta_subprogs.bpf.linked3.o on_event
52685 1653 5890 40180
1636
pyperf50.bpf.linked3.o on_event
36980 2623 5708 11967
880
strobemeta_nounroll1.bpf.linked3.o on_event
49337 1706 5522 32940
1552
loop1.bpf.linked3.o nested_loops
361349 5504 5504 90288
90300
pyperf_subprogs.bpf.linked3.o on_event
36029 2526 5425 11195
890
SELFTESTS AFTER
===============
[vmuser@archvm bpf]$ sudo veristat -e
file,prog,insns,states,jmp_allocs,jmp_total,jmp_peak -R
~/insn-hist-after-results-selftests.csv -s jmp_allocs -n 20
File Program
Insns States Jumphist allocs Jumphist total entries Jumphist
peak entries
-----------------------------------------
------------------------------ ------- ------ ---------------
---------------------- ---------------------
pyperf600_nounroll.bpf.linked3.o on_event
533132 34227 67332 260526
18282
pyperf600.bpf.linked3.o on_event
475837 22259 48488 183880
13455
verifier_loops1.bpf.linked3.o
loop_after_a_conditional_jump 1000001 25000 25000
25001 25002
strobemeta.bpf.linked3.o on_event
176036 4734 19835 147680
13666
pyperf180.bpf.linked3.o on_event
118245 8422 17797 50683
3763
test_verif_scale1.bpf.linked3.o balancer_ingress
546742 8636 16439 43048
270
test_verif_scale3.bpf.linked3.o balancer_ingress
837487 8636 16439 43048
270
xdp_synproxy_kern.bpf.linked3.o syncookie_tc
81241 5155 15347 46763
148
xdp_synproxy_kern.bpf.linked3.o syncookie_xdp
82297 5157 15321 49715
148
strobemeta_nounroll2.bpf.linked3.o on_event
104119 3820 12128 80924
3724
test_cls_redirect.bpf.linked3.o cls_redirect
65401 4212 11662 24862
79
pyperf100.bpf.linked3.o on_event
72685 5123 11208 36290
3201
test_cls_redirect_subprogs.bpf.linked3.o cls_redirect
57790 4063 9711 24814
146
loop3.bpf.linked3.o while_true
1000001 9663 9663 333321
333335
test_verif_scale2.bpf.linked3.o balancer_ingress
767498 3048 9144 28787
180
strobemeta_subprogs.bpf.linked3.o on_event
52685 1653 5890 45798
1776
pyperf50.bpf.linked3.o on_event
36980 2623 5708 18175
1691
strobemeta_nounroll1.bpf.linked3.o on_event
49337 1706 5522 38476
1718
loop1.bpf.linked3.o nested_loops
361349 5504 5504 90288
90300
pyperf_subprogs.bpf.linked3.o on_event
36029 2526 5425 18130
1885
Stats counting diff:
$ git show -- kernel
commit febebc9586c08820fa927b1628454b2709e98e3f (HEAD)
Author: Andrii Nakryiko <andrii@kernel.org>
Date: Thu Nov 9 11:02:40 2023 -0800
[EXPERIMENT] bpf: add jump/insns history stats
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b688043e5460..d0f25f36221e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2026,6 +2026,10 @@ static int pop_stack(struct bpf_verifier_env
*env, int *prev_insn_idx,
return -ENOENT;
if (cur) {
+ env->jmp_hist_peak = max(env->jmp_hist_peak,
cur->insn_hist_end);
+ env->jmp_hist_total += cur->insn_hist_end -
cur->insn_hist_start;
+ env->jmp_hist_allocs += 1;
+
err = copy_verifier_state(cur, &head->st);
if (err)
return err;
@@ -3648,6 +3653,8 @@ static int push_jmp_history(struct bpf_verifier_env *env,
p->idx = env->insn_idx;
p->prev_idx = env->prev_insn_idx;
cur->insn_hist_end++;
+
+ env->jmp_hist_peak = max(env->jmp_hist_peak, cur->insn_hist_end);
return 0;
}
@@ -17205,6 +17212,9 @@ static int is_state_visited(struct
bpf_verifier_env *env, int insn_idx)
WARN_ONCE(new->branches != 1,
"BUG is_state_visited:branches_to_explore=%d insn
%d\n", new->branches, insn_idx);
+ env->jmp_hist_total += cur->insn_hist_end - cur->insn_hist_start;
+ env->jmp_hist_allocs += 1;
+
cur->parent = new;
cur->first_insn_idx = insn_idx;
cur->insn_hist_start = cur->insn_hist_end;
@@ -20170,10 +20180,12 @@ static void print_verification_stats(struct
bpf_verifier_env *env)
verbose(env, "\n");
}
verbose(env, "processed %d insns (limit %d) max_states_per_insn %d "
- "total_states %d peak_states %d mark_read %d\n",
+ "total_states %d peak_states %d mark_read %d "
+ "jmp_allocs %d jmp_total %d jmp_peak %d\n",
env->insn_processed, BPF_COMPLEXITY_LIMIT_INSNS,
env->max_states_per_insn, env->total_states,
- env->peak_states, env->longest_mark_read_walk);
+ env->peak_states, env->longest_mark_read_walk,
+ env->jmp_hist_allocs, env->jmp_hist_total, env->jmp_hist_peak);
}
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
2023-11-09 19:49 ` Andrii Nakryiko
@ 2023-11-09 20:39 ` Andrii Nakryiko
2023-11-09 22:05 ` Alexei Starovoitov
0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 20:39 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Kernel Team
On Thu, Nov 9, 2023 at 11:49 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 11:29 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > >
> > > If we ever break DFS property, we can easily change this. Or we can
> > > even have a hybrid: as long as traversal preserves DFS property, we
> > > use global shared history, but we can also optionally clone and have
> > > our own history if necessary. It's a matter of adding optional
> > > potentially NULL pointer to "local history". All this is very nicely
> > > hidden away from "normal" code.
> >
> > If we can "easily change this" then let's make it last and optional patch.
> > So we can revert in the future when we need to take non-DFS path.
>
> Ok, sounds good. I'll reorder and put it last, you can decide whether
> to apply it or not that way.
>
> >
> > > But again, let's look at data first. I'll get back with numbers soon.
> >
> > Sure. I think memory increase due to more tracking is ok.
> > I suspect it won't cause 2x increase. Likely few %.
> > The last time I checked the main memory hog is states stashed for pruning.
>
> So I'm back with data. See verifier.c changes I did at the bottom,
> just to double check I'm not missing something major. I count the
> number of allocations (but that's an underestimate that doesn't take
> into account realloc), total number of instruction history entries for
> entire program verification, and then also peak "depth" of instruction
> history. Note that entries should be multiplied by 8 to get the amount
> of bytes (and that's not counting per-allocation overhead).
>
> Here are top 20 results, sorted by number of allocs for Meta-internal,
> Cilium, and selftests. BEFORE is without added STACK_ACCESS tracking
> and STACK_ZERO optimization. AFTER is with all the patches of this
> patch set applied.
>
> It's a few megabytes of memory allocation, which in itself is probably
> not a big deal. But it's just an amount of unnecessary memory
> allocations which is basically at least 2x of the total number of
> states that we can save. And instead have just a few reallocs to size
> global jump history to an order of magnitudes smaller peak entries.
>
> And if we ever decide to track more stuff similar to
> INSNS_F_STACK_ACCESS, we won't have to worry about more allocations or
> more memory usage, because the absolute worst case is our global
> history will be up to 1 million entries tops. We can track some *code
> path dependent* per-instruction information for *each simulated
> instruction* easily without having to think twice about this. Which I
> think is a nice liberating thought in itself justifying this change.
>
>
Gmail butchered tables. See Github gist ([0]) for it properly formatted.
[0] https://gist.github.com/anakryiko/04c5a3a5ae4ee672bd11d4b7b3d832f5
>
> Stats counting diff:
>
> $ git show -- kernel
> commit febebc9586c08820fa927b1628454b2709e98e3f (HEAD)
> Author: Andrii Nakryiko <andrii@kernel.org>
> Date: Thu Nov 9 11:02:40 2023 -0800
>
> [EXPERIMENT] bpf: add jump/insns history stats
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index b688043e5460..d0f25f36221e 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2026,6 +2026,10 @@ static int pop_stack(struct bpf_verifier_env
> *env, int *prev_insn_idx,
> return -ENOENT;
>
> if (cur) {
> + env->jmp_hist_peak = max(env->jmp_hist_peak,
> cur->insn_hist_end);
> + env->jmp_hist_total += cur->insn_hist_end -
> cur->insn_hist_start;
> + env->jmp_hist_allocs += 1;
> +
> err = copy_verifier_state(cur, &head->st);
> if (err)
> return err;
> @@ -3648,6 +3653,8 @@ static int push_jmp_history(struct bpf_verifier_env *env,
> p->idx = env->insn_idx;
> p->prev_idx = env->prev_insn_idx;
> cur->insn_hist_end++;
> +
> + env->jmp_hist_peak = max(env->jmp_hist_peak, cur->insn_hist_end);
> return 0;
> }
>
> @@ -17205,6 +17212,9 @@ static int is_state_visited(struct
> bpf_verifier_env *env, int insn_idx)
> WARN_ONCE(new->branches != 1,
> "BUG is_state_visited:branches_to_explore=%d insn
> %d\n", new->branches, insn_idx);
>
> + env->jmp_hist_total += cur->insn_hist_end - cur->insn_hist_start;
> + env->jmp_hist_allocs += 1;
> +
> cur->parent = new;
> cur->first_insn_idx = insn_idx;
> cur->insn_hist_start = cur->insn_hist_end;
> @@ -20170,10 +20180,12 @@ static void print_verification_stats(struct
> bpf_verifier_env *env)
> verbose(env, "\n");
> }
> verbose(env, "processed %d insns (limit %d) max_states_per_insn %d "
> - "total_states %d peak_states %d mark_read %d\n",
> + "total_states %d peak_states %d mark_read %d "
> + "jmp_allocs %d jmp_total %d jmp_peak %d\n",
> env->insn_processed, BPF_COMPLEXITY_LIMIT_INSNS,
> env->max_states_per_insn, env->total_states,
> - env->peak_states, env->longest_mark_read_walk);
> + env->peak_states, env->longest_mark_read_walk,
> + env->jmp_hist_allocs, env->jmp_hist_total, env->jmp_hist_peak);
> }
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
2023-11-09 20:39 ` Andrii Nakryiko
@ 2023-11-09 22:05 ` Alexei Starovoitov
2023-11-09 22:57 ` Andrii Nakryiko
0 siblings, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2023-11-09 22:05 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Kernel Team
On Thu, Nov 9, 2023 at 12:39 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 11:49 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Nov 9, 2023 at 11:29 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > >
> > > > If we ever break DFS property, we can easily change this. Or we can
> > > > even have a hybrid: as long as traversal preserves DFS property, we
> > > > use global shared history, but we can also optionally clone and have
> > > > our own history if necessary. It's a matter of adding optional
> > > > potentially NULL pointer to "local history". All this is very nicely
> > > > hidden away from "normal" code.
> > >
> > > If we can "easily change this" then let's make it last and optional patch.
> > > So we can revert in the future when we need to take non-DFS path.
> >
> > Ok, sounds good. I'll reorder and put it last, you can decide whether
> > to apply it or not that way.
> >
> > >
> > > > But again, let's look at data first. I'll get back with numbers soon.
> > >
> > > Sure. I think memory increase due to more tracking is ok.
> > > I suspect it won't cause 2x increase. Likely few %.
> > > The last time I checked the main memory hog is states stashed for pruning.
> >
> > So I'm back with data. See verifier.c changes I did at the bottom,
> > just to double check I'm not missing something major. I count the
> > number of allocations (but that's an underestimate that doesn't take
> > into account realloc), total number of instruction history entries for
> > entire program verification, and then also peak "depth" of instruction
> > history. Note that entries should be multiplied by 8 to get the amount
> > of bytes (and that's not counting per-allocation overhead).
> >
> > Here are top 20 results, sorted by number of allocs for Meta-internal,
> > Cilium, and selftests. BEFORE is without added STACK_ACCESS tracking
> > and STACK_ZERO optimization. AFTER is with all the patches of this
> > patch set applied.
> >
> > It's a few megabytes of memory allocation, which in itself is probably
> > not a big deal. But it's just an amount of unnecessary memory
> > allocations which is basically at least 2x of the total number of
> > states that we can save. And instead have just a few reallocs to size
> > global jump history to an order of magnitudes smaller peak entries.
> >
> > And if we ever decide to track more stuff similar to
> > INSNS_F_STACK_ACCESS, we won't have to worry about more allocations or
> > more memory usage, because the absolute worst case is our global
> > history will be up to 1 million entries tops. We can track some *code
> > path dependent* per-instruction information for *each simulated
> > instruction* easily without having to think twice about this. Which I
> > think is a nice liberating thought in itself justifying this change.
> >
> >
>
> Gmail butchered tables. See Github gist ([0]) for it properly formatted.
>
> [0] https://gist.github.com/anakryiko/04c5a3a5ae4ee672bd11d4b7b3d832f5
I think 'peak insn history' is the one to look for, since
it indicates total peak memory consumption. Right?
It seems the numbers point out a bug in number collection or
a bug in implementation.
before:
verifier_loops1.bpf.linked3.o peak=499999
loop3.bpf.linked3.o peak=111111
which makes sense, since both tests hit 1m insn.
I can see where 1/2 and 1/9 come from based on asm.
after:
verifier_loops1.bpf.linked3.o peak=25002
loop3.bpf.linked3.o peak=333335
So the 1st test got 20 times smaller memory footprint
while 2nd was 3 times higher.
Both are similar infinite loops.
The 1st one is:
l1_%=: r0 += 1; \
goto l1_%=; \
My understanding is that there should be all 500k jmps in history with
or without these patches.
So now I'm more worried about the correctness of the 1st patch.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
2023-11-09 22:05 ` Alexei Starovoitov
@ 2023-11-09 22:57 ` Andrii Nakryiko
2023-11-11 4:29 ` Andrii Nakryiko
0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-09 22:57 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Kernel Team
On Thu, Nov 9, 2023 at 2:06 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 12:39 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Nov 9, 2023 at 11:49 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Thu, Nov 9, 2023 at 11:29 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > >
> > > > > If we ever break DFS property, we can easily change this. Or we can
> > > > > even have a hybrid: as long as traversal preserves DFS property, we
> > > > > use global shared history, but we can also optionally clone and have
> > > > > our own history if necessary. It's a matter of adding optional
> > > > > potentially NULL pointer to "local history". All this is very nicely
> > > > > hidden away from "normal" code.
> > > >
> > > > If we can "easily change this" then let's make it last and optional patch.
> > > > So we can revert in the future when we need to take non-DFS path.
> > >
> > > Ok, sounds good. I'll reorder and put it last, you can decide whether
> > > to apply it or not that way.
> > >
> > > >
> > > > > But again, let's look at data first. I'll get back with numbers soon.
> > > >
> > > > Sure. I think memory increase due to more tracking is ok.
> > > > I suspect it won't cause 2x increase. Likely few %.
> > > > The last time I checked the main memory hog is states stashed for pruning.
> > >
> > > So I'm back with data. See verifier.c changes I did at the bottom,
> > > just to double check I'm not missing something major. I count the
> > > number of allocations (but that's an underestimate that doesn't take
> > > into account realloc), total number of instruction history entries for
> > > entire program verification, and then also peak "depth" of instruction
> > > history. Note that entries should be multiplied by 8 to get the amount
> > > of bytes (and that's not counting per-allocation overhead).
> > >
> > > Here are top 20 results, sorted by number of allocs for Meta-internal,
> > > Cilium, and selftests. BEFORE is without added STACK_ACCESS tracking
> > > and STACK_ZERO optimization. AFTER is with all the patches of this
> > > patch set applied.
> > >
> > > It's a few megabytes of memory allocation, which in itself is probably
> > > not a big deal. But it's just an amount of unnecessary memory
> > > allocations which is basically at least 2x of the total number of
> > > states that we can save. And instead have just a few reallocs to size
> > > global jump history to an order of magnitudes smaller peak entries.
> > >
> > > And if we ever decide to track more stuff similar to
> > > INSNS_F_STACK_ACCESS, we won't have to worry about more allocations or
> > > more memory usage, because the absolute worst case is our global
> > > history will be up to 1 million entries tops. We can track some *code
> > > path dependent* per-instruction information for *each simulated
> > > instruction* easily without having to think twice about this. Which I
> > > think is a nice liberating thought in itself justifying this change.
> > >
> > >
> >
> > Gmail butchered tables. See Github gist ([0]) for it properly formatted.
> >
> > [0] https://gist.github.com/anakryiko/04c5a3a5ae4ee672bd11d4b7b3d832f5
>
> I think 'peak insn history' is the one to look for, since
> it indicates total peak memory consumption. Right?
Hm... not really? Peak here is the longest sequence of recorded jumps
from root state to any "current". I calculated that to know how big
global history would be necessary.
But it's definitely not a total peak memory consumption, because there
will be states enqueued in a stack still to be processed, and we keep
their jmp_history around. see push_stack() and copy_verifier_state()
we do in that.
> It seems the numbers point out a bug in number collection or
> a bug in implementation.
yeah, but accounting implementation, I suspect. I think I'm not
handling failing states properly.
I'll double check and fix it up, but basically only failing BPF
programs should have bad accounting.
>
> before:
> verifier_loops1.bpf.linked3.o peak=499999
> loop3.bpf.linked3.o peak=111111
>
> which makes sense, since both tests hit 1m insn.
> I can see where 1/2 and 1/9 come from based on asm.
>
> after:
> verifier_loops1.bpf.linked3.o peak=25002
> loop3.bpf.linked3.o peak=333335
>
> So the 1st test got 20 times smaller memory footprint
> while 2nd was 3 times higher.
>
> Both are similar infinite loops.
>
> The 1st one is:
> l1_%=: r0 += 1; \
> goto l1_%=; \
>
> My understanding is that there should be all 500k jmps in history with
> or without these patches.
>
> So now I'm more worried about the correctness of the 1st patch.
I'll look closer at what's going on and will report back.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking
2023-11-09 18:20 ` Eduard Zingerman
@ 2023-11-10 5:48 ` Andrii Nakryiko
2023-11-12 1:57 ` Andrii Nakryiko
0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-10 5:48 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team,
Tao Lyu
On Thu, Nov 9, 2023 at 10:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2023-11-09 at 09:20 -0800, Andrii Nakryiko wrote:
> [...]
> > > > struct bpf_insn_hist_entry {
> > > > - u32 prev_idx;
> > > > u32 idx;
> > > > + /* insn idx can't be bigger than 1 million */
> > > > + u32 prev_idx : 22;
> > > > + /* special flags, e.g., whether insn is doing register stack spill/load */
> > > > + u32 flags : 10;
> > > > };
> > >
> > > Nitpick: maybe use separate bit-fields for frameno and spi instead of
> > > flags? Or add dedicated accessor functions?
> >
> > I wanted to keep it very uniform so that push_insn_history() doesn't
> > know about all such details. It just has "flags". We might use these
> > flags for some other use cases, though if we run out of bits we'll
> > probably just expand bpf_insn_hist_entry and refactor existing code
> > anyways. So, basically, I didn't want to over-engineer this bit too
> > much :)
>
> Well, maybe hide "(hist->flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK"
> behind an accessor?
I'll add a single line helper function just to not be PITA, but I
don't think it's worth it. There are two places we do this, one next
to the other within the same function. This helper is just going to
add mental overhead and won't really help us with anything.
>
> [...]
>
> > > > +static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
> > > > + int insn_flags)
> > > > {
> > > > struct bpf_insn_hist_entry *p;
> > > > size_t alloc_size;
> > > >
> > > > - if (!is_jmp_point(env, env->insn_idx))
> > > > + /* combine instruction flags if we already recorded this instruction */
> > > > + if (cur->insn_hist_end > cur->insn_hist_start &&
> > > > + (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
> > > > + p->idx == env->insn_idx &&
> > > > + p->prev_idx == env->prev_insn_idx) {
> > > > + p->flags |= insn_flags;
> > >
> > > Nitpick: maybe add an assert to check that frameno/spi are not or'ed?
> >
> > ok, something like
> >
> > WARN_ON_ONCE(p->flags & (INSN_F_STACK_ACCESS | INSN_F_FRAMENOMASK |
> > (INSN_F_SPI_MASK << INSN_F_SPI_SHIFT)));
> >
> > ?
>
> Something like this, yes.
>
I added it, and I hate it. It's just a visual noise. Feels too
paranoid, I'll probably drop it.
> [...]
>
> > > > @@ -4713,9 +4711,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> > > >
> > > > /* Mark slots affected by this stack write. */
> > > > for (i = 0; i < size; i++)
> > > > - state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] =
> > > > - type;
> > > > + state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = type;
> > > > + insn_flags = 0; /* not a register spill */
> > > > }
> > > > +
> > > > + if (insn_flags)
> > > > + return push_insn_history(env, env->cur_state, insn_flags);
> > >
> > > Maybe add a check that insn is BPF_ST or BPF_STX here?
> > > Only these cases are supported by backtrack_insn() while
> > > check_mem_access() is called from multiple places.
> >
> > seems like a wrong place to enforce that check_stack_write_fixed_off()
> > is called only for those instructions?
>
> check_stack_write_fixed_off() is called from check_stack_write() which
> is called from check_mem_access() which might trigger
> check_stack_write_fixed_off() when called with BPF_WRITE flag and
> pointer to stack as an argument.
> This happens for ST, STX but also in check_helper_call(),
> process_iter_arg() (maybe other places).
> Speaking of which, should this be handled in backtrack_insn()?
Note that we set insn_flags only for cases where we do an actual
register spill (save_register_state calls for non-fake registers). If
register spill is possible from a helper call somehow, we'll be in
much bigger trouble elsewhere.
>
> > [...]
> >
> > trimming is good
>
> Sigh... sorry, really tried to trim everything today.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states
2023-11-09 22:57 ` Andrii Nakryiko
@ 2023-11-11 4:29 ` Andrii Nakryiko
0 siblings, 0 replies; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-11 4:29 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Eduard Zingerman, Andrii Nakryiko, bpf, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Kernel Team
On Thu, Nov 9, 2023 at 2:57 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 2:06 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Nov 9, 2023 at 12:39 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Thu, Nov 9, 2023 at 11:49 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Thu, Nov 9, 2023 at 11:29 AM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko
> > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > If we ever break DFS property, we can easily change this. Or we can
> > > > > > even have a hybrid: as long as traversal preserves DFS property, we
> > > > > > use global shared history, but we can also optionally clone and have
> > > > > > our own history if necessary. It's a matter of adding optional
> > > > > > potentially NULL pointer to "local history". All this is very nicely
> > > > > > hidden away from "normal" code.
> > > > >
> > > > > If we can "easily change this" then let's make it last and optional patch.
> > > > > So we can revert in the future when we need to take non-DFS path.
> > > >
> > > > Ok, sounds good. I'll reorder and put it last, you can decide whether
> > > > to apply it or not that way.
> > > >
> > > > >
> > > > > > But again, let's look at data first. I'll get back with numbers soon.
> > > > >
> > > > > Sure. I think memory increase due to more tracking is ok.
> > > > > I suspect it won't cause 2x increase. Likely few %.
> > > > > The last time I checked the main memory hog is states stashed for pruning.
> > > >
> > > > So I'm back with data. See verifier.c changes I did at the bottom,
> > > > just to double check I'm not missing something major. I count the
> > > > number of allocations (but that's an underestimate that doesn't take
> > > > into account realloc), total number of instruction history entries for
> > > > entire program verification, and then also peak "depth" of instruction
> > > > history. Note that entries should be multiplied by 8 to get the amount
> > > > of bytes (and that's not counting per-allocation overhead).
> > > >
> > > > Here are top 20 results, sorted by number of allocs for Meta-internal,
> > > > Cilium, and selftests. BEFORE is without added STACK_ACCESS tracking
> > > > and STACK_ZERO optimization. AFTER is with all the patches of this
> > > > patch set applied.
> > > >
> > > > It's a few megabytes of memory allocation, which in itself is probably
> > > > not a big deal. But it's just an amount of unnecessary memory
> > > > allocations which is basically at least 2x of the total number of
> > > > states that we can save. And instead have just a few reallocs to size
> > > > global jump history to an order of magnitudes smaller peak entries.
> > > >
> > > > And if we ever decide to track more stuff similar to
> > > > INSNS_F_STACK_ACCESS, we won't have to worry about more allocations or
> > > > more memory usage, because the absolute worst case is our global
> > > > history will be up to 1 million entries tops. We can track some *code
> > > > path dependent* per-instruction information for *each simulated
> > > > instruction* easily without having to think twice about this. Which I
> > > > think is a nice liberating thought in itself justifying this change.
> > > >
> > > >
> > >
> > > Gmail butchered tables. See Github gist ([0]) for it properly formatted.
> > >
> > > [0] https://gist.github.com/anakryiko/04c5a3a5ae4ee672bd11d4b7b3d832f5
> >
> > I think 'peak insn history' is the one to look for, since
> > it indicates total peak memory consumption. Right?
>
> Hm... not really? Peak here is the longest sequence of recorded jumps
> from root state to any "current". I calculated that to know how big
> global history would be necessary.
>
> But it's definitely not a total peak memory consumption, because there
> will be states enqueued in a stack still to be processed, and we keep
> their jmp_history around. see push_stack() and copy_verifier_state()
> we do in that.
>
> > It seems the numbers point out a bug in number collection or
> > a bug in implementation.
>
> yeah, but accounting implementation, I suspect. I think I'm not
> handling failing states properly.
>
> I'll double check and fix it up, but basically only failing BPF
> programs should have bad accounting.
Alexei, your intuition was right! There is indeed a bug in patch #2.
What's funny, it's conceptually the same bug I just fixed in
backtracking logic ([0]). Basically, we cannot rely on checking
instruction indices for equality to make sure it's exactly the same
verified instruction (because it could be the instruction with the
same index, but earlier in verification history). I do also have a bit
of accounting imprecision for last failed state for jmp_total stat,
but I didn't bother fixing it because it's just off by few, and only
for failed validations.
The good news is that this bug actually doesn't affect results at all,
except for that one verifier_loops1.c case (for the same reason why
that bug in [0] wasn't reported earlier, it's a very rare situation
for real-world BPF programs). See updated results in [1].
[0] https://patchwork.kernel.org/project/netdevbpf/patch/20231110002638.4168352-3-andrii@kernel.org/
[1] https://gist.github.com/anakryiko/4e61d28f1a2caecea4315e50e4346120
Anyways, the fix is pretty straightforward, if not the most elegant.
I'll roll it into patch #2 for next revision (it will be patch #1,
because I moved common history refactoring to be the last one, as
agreed). Still need to add tests and stuff that Eduard requested.
commit f11d05fb037f6a69eff8c7d4eff4c422374af37f (HEAD -> bpf-verif-jmp-history)
Author: Andrii Nakryiko <andrii@kernel.org>
Date: Fri Nov 10 20:12:05 2023 -0800
[FIX] remember current insn hist entry to reuse for flags
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 75d9507a8a9f..42a7619dcf80 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -693,6 +693,7 @@ struct bpf_verifier_env {
*/
char tmp_str_buf[TMP_STR_BUF_LEN];
struct bpf_insn_hist_entry *insn_hist;
+ struct bpf_insn_hist_entry *cur_hist_ent;
u32 insn_hist_cap;
};
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2878077e0a54..eebda0367dca 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3652,13 +3652,8 @@ static int push_insn_history(struct
bpf_verifier_env *env, struct bpf_verifier_s
size_t alloc_size;
/* combine instruction flags if we already recorded this instruction */
- if (cur->insn_hist_end > cur->insn_hist_start &&
- (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
- p->idx == env->insn_idx &&
- p->prev_idx == env->prev_insn_idx) {
- WARN_ON_ONCE(p->flags & (INSN_F_STACK_ACCESS |
- INSN_F_FRAMENO_MASK | (INSN_F_SPI_MASK <<
INSN_F_SPI_SHIFT)));
- p->flags |= insn_flags;
+ if (env->cur_hist_ent) {
+ env->cur_hist_ent->flags |= insn_flags;
return 0;
}
@@ -3676,6 +3671,7 @@ static int push_insn_history(struct
bpf_verifier_env *env, struct bpf_verifier_s
p->idx = env->insn_idx;
p->prev_idx = env->prev_insn_idx;
p->flags = insn_flags;
+ env->cur_hist_ent = p;
cur->insn_hist_end++;
env->jmp_hist_peak = max(env->jmp_hist_peak, cur->insn_hist_end);
@@ -17408,6 +17404,9 @@ static int do_check(struct bpf_verifier_env *env)
u8 class;
int err;
+ /* reset current history entry on each new instruction */
+ env->cur_hist_ent = NULL;
+
env->prev_insn_idx = prev_insn_idx;
if (env->insn_idx >= insn_cnt) {
verbose(env, "invalid insn idx %d insn_cnt %d\n",
>
> >
> > before:
> > verifier_loops1.bpf.linked3.o peak=499999
> > loop3.bpf.linked3.o peak=111111
> >
> > which makes sense, since both tests hit 1m insn.
> > I can see where 1/2 and 1/9 come from based on asm.
> >
> > after:
> > verifier_loops1.bpf.linked3.o peak=25002
> > loop3.bpf.linked3.o peak=333335
> >
> > So the 1st test got 20 times smaller memory footprint
> > while 2nd was 3 times higher.
> >
> > Both are similar infinite loops.
> >
> > The 1st one is:
> > l1_%=: r0 += 1; \
> > goto l1_%=; \
> >
> > My understanding is that there should be all 500k jmps in history with
> > or without these patches.
> >
> > So now I'm more worried about the correctness of the 1st patch.
>
> I'll look closer at what's going on and will report back.
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking
2023-11-10 5:48 ` Andrii Nakryiko
@ 2023-11-12 1:57 ` Andrii Nakryiko
2023-11-12 14:05 ` Eduard Zingerman
0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2023-11-12 1:57 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team,
Tao Lyu
On Thu, Nov 9, 2023 at 9:48 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 10:20 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Thu, 2023-11-09 at 09:20 -0800, Andrii Nakryiko wrote:
> > [...]
> > > > > struct bpf_insn_hist_entry {
> > > > > - u32 prev_idx;
> > > > > u32 idx;
> > > > > + /* insn idx can't be bigger than 1 million */
> > > > > + u32 prev_idx : 22;
> > > > > + /* special flags, e.g., whether insn is doing register stack spill/load */
> > > > > + u32 flags : 10;
> > > > > };
> > > >
> > > > Nitpick: maybe use separate bit-fields for frameno and spi instead of
> > > > flags? Or add dedicated accessor functions?
> > >
> > > I wanted to keep it very uniform so that push_insn_history() doesn't
> > > know about all such details. It just has "flags". We might use these
> > > flags for some other use cases, though if we run out of bits we'll
> > > probably just expand bpf_insn_hist_entry and refactor existing code
> > > anyways. So, basically, I didn't want to over-engineer this bit too
> > > much :)
> >
> > Well, maybe hide "(hist->flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK"
> > behind an accessor?
>
> I'll add a single line helper function just to not be PITA, but I
> don't think it's worth it. There are two places we do this, one next
> to the other within the same function. This helper is just going to
> add mental overhead and won't really help us with anything.
>
> >
> > [...]
> >
> > > > > +static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
> > > > > + int insn_flags)
> > > > > {
> > > > > struct bpf_insn_hist_entry *p;
> > > > > size_t alloc_size;
> > > > >
> > > > > - if (!is_jmp_point(env, env->insn_idx))
> > > > > + /* combine instruction flags if we already recorded this instruction */
> > > > > + if (cur->insn_hist_end > cur->insn_hist_start &&
> > > > > + (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
> > > > > + p->idx == env->insn_idx &&
> > > > > + p->prev_idx == env->prev_insn_idx) {
> > > > > + p->flags |= insn_flags;
> > > >
> > > > Nitpick: maybe add an assert to check that frameno/spi are not or'ed?
> > >
> > > ok, something like
> > >
> > > WARN_ON_ONCE(p->flags & (INSN_F_STACK_ACCESS | INSN_F_FRAMENOMASK |
> > > (INSN_F_SPI_MASK << INSN_F_SPI_SHIFT)));
> > >
> > > ?
> >
> > Something like this, yes.
> >
>
> I added it, and I hate it. It's just a visual noise. Feels too
> paranoid, I'll probably drop it.
>
I ended up with these changes on top of this patch:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 23dbfb5022ba..d234c6f53741 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3167,6 +3167,21 @@ static int check_reg_arg(struct
bpf_verifier_env *env, u32 regno,
return 0;
}
+static int insn_stack_access_flags(int frameno, int spi)
+{
+ return INSN_F_STACK_ACCESS | (spi << INSN_F_SPI_SHIFT) | frameno;
+}
+
+static int insn_stack_access_spi(int insn_flags)
+{
+ return (insn_flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK;
+}
+
+static int insn_stack_access_frameno(int insn_flags)
+{
+ return insn_flags & INSN_F_FRAMENO_MASK;
+}
+
static void mark_jmp_point(struct bpf_verifier_env *env, int idx)
{
env->insn_aux_data[idx].jmp_point = true;
@@ -3187,6 +3202,7 @@ static int push_insn_history(struct
bpf_verifier_env *env, struct bpf_verifier_s
/* combine instruction flags if we already recorded this instruction */
if (env->cur_hist_ent) {
+ WARN_ON_ONCE(env->cur_hist_ent->flags & insn_flags);
env->cur_hist_ent->flags |= insn_flags;
return 0;
}
@@ -3499,8 +3515,8 @@ static int backtrack_insn(struct
bpf_verifier_env *env, int idx, int subseq_idx,
* that [fp - off] slot contains scalar that needs to be
* tracked with precision
*/
- spi = (hist->flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK;
- fr = hist->flags & INSN_F_FRAMENO_MASK;
+ spi = insn_stack_access_spi(hist->flags);
+ fr = insn_stack_access_frameno(hist->flags);
bt_set_frame_slot(bt, fr, spi);
} else if (class == BPF_STX || class == BPF_ST) {
if (bt_is_reg_set(bt, dreg))
@@ -3512,8 +3528,8 @@ static int backtrack_insn(struct
bpf_verifier_env *env, int idx, int subseq_idx,
/* scalars can only be spilled into stack */
if (!hist || !(hist->flags & INSN_F_STACK_ACCESS))
return 0;
- spi = (hist->flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK;
- fr = hist->flags & INSN_F_FRAMENO_MASK;
+ spi = insn_stack_access_spi(hist->flags);
+ fr = insn_stack_access_frameno(hist->flags);
if (!bt_is_frame_slot_set(bt, fr, spi))
return 0;
bt_clear_frame_slot(bt, fr, spi);
@@ -4322,7 +4338,7 @@ static int check_stack_write_fixed_off(struct
bpf_verifier_env *env,
int i, slot = -off - 1, spi = slot / BPF_REG_SIZE, err;
struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
struct bpf_reg_state *reg = NULL;
- int insn_flags = INSN_F_STACK_ACCESS | (spi <<
INSN_F_SPI_SHIFT) | state->frameno;
+ int insn_flags = insn_stack_access_flags(state->frameno, spi);
err = grow_stack_state(state, round_up(slot + 1, BPF_REG_SIZE));
if (err)
@@ -4618,7 +4634,7 @@ static int check_stack_read_fixed_off(struct
bpf_verifier_env *env,
int i, slot = -off - 1, spi = slot / BPF_REG_SIZE;
struct bpf_reg_state *reg;
u8 *stype, type;
- int insn_flags = INSN_F_STACK_ACCESS | (spi <<
INSN_F_SPI_SHIFT) | reg_state->frameno;
+ int insn_flags = insn_stack_access_flags(reg_state->frameno, spi);
stype = reg_state->stack[spi].slot_type;
reg = ®_state->stack[spi].spilled_ptr;
> > [...]
> >
> > > > > @@ -4713,9 +4711,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> > > > >
> > > > > /* Mark slots affected by this stack write. */
> > > > > for (i = 0; i < size; i++)
> > > > > - state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] =
> > > > > - type;
> > > > > + state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = type;
> > > > > + insn_flags = 0; /* not a register spill */
> > > > > }
> > > > > +
> > > > > + if (insn_flags)
> > > > > + return push_insn_history(env, env->cur_state, insn_flags);
> > > >
> > > > Maybe add a check that insn is BPF_ST or BPF_STX here?
> > > > Only these cases are supported by backtrack_insn() while
> > > > check_mem_access() is called from multiple places.
> > >
> > > seems like a wrong place to enforce that check_stack_write_fixed_off()
> > > is called only for those instructions?
> >
> > check_stack_write_fixed_off() is called from check_stack_write() which
> > is called from check_mem_access() which might trigger
> > check_stack_write_fixed_off() when called with BPF_WRITE flag and
> > pointer to stack as an argument.
> > This happens for ST, STX but also in check_helper_call(),
> > process_iter_arg() (maybe other places).
> > Speaking of which, should this be handled in backtrack_insn()?
>
> Note that we set insn_flags only for cases where we do an actual
> register spill (save_register_state calls for non-fake registers). If
> register spill is possible from a helper call somehow, we'll be in
> much bigger trouble elsewhere.
>
> >
> > > [...]
> > >
> > > trimming is good
> >
> > Sigh... sorry, really tried to trim everything today.
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking
2023-11-12 1:57 ` Andrii Nakryiko
@ 2023-11-12 14:05 ` Eduard Zingerman
0 siblings, 0 replies; 45+ messages in thread
From: Eduard Zingerman @ 2023-11-12 14:05 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team,
Tao Lyu
On Sat, 2023-11-11 at 17:57 -0800, Andrii Nakryiko wrote:
[...]
> I ended up with these changes on top of this patch:
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 23dbfb5022ba..d234c6f53741 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3167,6 +3167,21 @@ static int check_reg_arg(struct
> bpf_verifier_env *env, u32 regno,
> return 0;
> }
>
> +static int insn_stack_access_flags(int frameno, int spi)
> +{
> + return INSN_F_STACK_ACCESS | (spi << INSN_F_SPI_SHIFT) | frameno;
> +}
> +
> +static int insn_stack_access_spi(int insn_flags)
> +{
> + return (insn_flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK;
> +}
> +
> +static int insn_stack_access_frameno(int insn_flags)
> +{
> + return insn_flags & INSN_F_FRAMENO_MASK;
> +}
Looks good, thank you.
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2023-11-12 14:05 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-31 5:03 [PATCH bpf-next 0/7] Complete BPF verifier precision tracking support for register spills Andrii Nakryiko
2023-10-31 5:03 ` [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states Andrii Nakryiko
2023-11-09 15:20 ` Eduard Zingerman
2023-11-09 16:13 ` Alexei Starovoitov
2023-11-09 17:28 ` Andrii Nakryiko
2023-11-09 19:29 ` Alexei Starovoitov
2023-11-09 19:49 ` Andrii Nakryiko
2023-11-09 20:39 ` Andrii Nakryiko
2023-11-09 22:05 ` Alexei Starovoitov
2023-11-09 22:57 ` Andrii Nakryiko
2023-11-11 4:29 ` Andrii Nakryiko
2023-10-31 5:03 ` [PATCH bpf-next 2/7] bpf: support non-r10 register spill/fill to/from stack in precision tracking Andrii Nakryiko
2023-11-09 15:20 ` Eduard Zingerman
2023-11-09 17:20 ` Andrii Nakryiko
2023-11-09 18:20 ` Eduard Zingerman
2023-11-10 5:48 ` Andrii Nakryiko
2023-11-12 1:57 ` Andrii Nakryiko
2023-11-12 14:05 ` Eduard Zingerman
2023-10-31 5:03 ` [PATCH bpf-next 3/7] bpf: enforce precision for r0 on callback return Andrii Nakryiko
2023-11-09 15:20 ` Eduard Zingerman
2023-11-09 17:32 ` Andrii Nakryiko
2023-11-09 17:38 ` Eduard Zingerman
2023-11-09 17:50 ` Andrii Nakryiko
2023-11-09 17:58 ` Alexei Starovoitov
2023-11-09 18:01 ` Andrii Nakryiko
2023-11-09 18:03 ` Eduard Zingerman
2023-11-09 18:00 ` Eduard Zingerman
2023-10-31 5:03 ` [PATCH bpf-next 4/7] bpf: fix check for attempt to corrupt spilled pointer Andrii Nakryiko
2023-11-09 15:20 ` Eduard Zingerman
2023-10-31 5:03 ` [PATCH bpf-next 5/7] bpf: preserve STACK_ZERO slots on partial reg spills Andrii Nakryiko
2023-11-09 15:20 ` Eduard Zingerman
2023-11-09 17:37 ` Andrii Nakryiko
2023-11-09 17:54 ` Eduard Zingerman
2023-10-31 5:03 ` [PATCH bpf-next 6/7] bpf: preserve constant zero when doing partial register restore Andrii Nakryiko
2023-11-09 15:20 ` Eduard Zingerman
2023-11-09 17:41 ` Andrii Nakryiko
2023-11-09 19:34 ` Eduard Zingerman
2023-10-31 5:03 ` [PATCH bpf-next 7/7] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Andrii Nakryiko
2023-10-31 5:22 ` Andrii Nakryiko
2023-11-01 7:56 ` Jiri Olsa
2023-11-01 16:27 ` Andrii Nakryiko
2023-11-02 9:54 ` Jiri Olsa
2023-11-09 15:21 ` Eduard Zingerman
2023-11-09 17:43 ` Andrii Nakryiko
2023-11-09 17:44 ` Eduard Zingerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox