BPF List
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/7] IRQ save/restore
@ 2024-11-27 16:58 Kumar Kartikeya Dwivedi
  2024-11-27 16:58 ` [PATCH bpf-next v3 1/7] bpf: Consolidate locks and reference state in verifier state Kumar Kartikeya Dwivedi
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-27 16:58 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kernel-team

This set introduces support for managing IRQ state from BPF programs.
Two new kfuncs, bpf_local_irq_save, and bpf_local_irq_restore are
introduced to enable this functionality.

Intended use cases are writing IRQ safe data structures (e.g. memory
allocator) in BPF programs natively, and use in new spin locking
primitives intended to be introduced in the next few weeks.

The set begins with some refactoring patches before the actual
functionality is introduced. Patch 1 consolidates all resource related
state in bpf_verifier_state, and moves it out from bpf_func_state.

Patch 2 refactor acquire and release functions for reference state to
make them reusable without duplication for other resource types.

After this, patch 3 refactors stack slot liveness marking logic to be
shared between dynptr, and iterators, in preparation for introducing
same logic for irq flag object on stack.

Finally, patch 4 and 7 introduce the new kfuncs and their selftests. For
more details, please inspect the patch commit logs. Patch 5 makes the
error message in case of resource leaks under BPF_EXIT a bit clearer.
Patch 6 expands coverage of existing preempt-disable selftest to cover
sleepable kfuncs.

See individual patches for more details.

Changelog:
----------
v2 -> v3
v2: https://lore.kernel.org/bpf/20241127153306.1484562-1-memxor@gmail.com

 * Drop REF_TYPE_LOCK_MASK
 * Add kfunc declarations to selftest to silence s390 CI errors

v1 -> v2
v1: https://lore.kernel.org/bpf/20241121005329.408873-1-memxor@gmail.com

 * Drop reference -> resource renaming in the verifier (Eduard, Alexei)
 * Change verifier log for check_resource_leak for BPF_EXIT (Eduard)
 * Remove id parameter from acquire_resource_state, read s->id (Eduard)
 * Rename erase to release for reference state (Eduard)
 * Move resource state to bpf_verifier_state (Eduard, Alexei)
 * Drop unnecessary casting to/from u64 in helpers (Eduard)
 * Add test for arg != PTR_TO_STACK (Eduard)
 * Drop now redundant tests (Eduard)
 * Address some other misc nits
 * Add Reviewed-by and Acked-by from Eduard

Kumar Kartikeya Dwivedi (7):
  bpf: Consolidate locks and reference state in verifier state
  bpf: Refactor {acquire,release}_reference_state
  bpf: Refactor mark_{dynptr,iter}_read
  bpf: Introduce support for bpf_local_irq_{save,restore}
  bpf: Improve verifier log for resource leak on exit
  selftests/bpf: Expand coverage of preempt tests to sleepable kfunc
  selftests/bpf: Add IRQ save/restore tests

 include/linux/bpf_verifier.h                  |  19 +-
 kernel/bpf/helpers.c                          |  17 +
 kernel/bpf/log.c                              |  12 +-
 kernel/bpf/verifier.c                         | 531 +++++++++++++-----
 .../selftests/bpf/prog_tests/verifier.c       |   2 +
 .../selftests/bpf/progs/exceptions_fail.c     |   4 +-
 tools/testing/selftests/bpf/progs/irq.c       | 397 +++++++++++++
 .../selftests/bpf/progs/preempt_lock.c        |  26 +-
 .../selftests/bpf/progs/verifier_spin_lock.c  |   2 +-
 9 files changed, 863 insertions(+), 147 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/irq.c


base-commit: c8d02b547363880d996f80c38cc8b997c7b90725
-- 
2.43.5


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH bpf-next v3 1/7] bpf: Consolidate locks and reference state in verifier state
  2024-11-27 16:58 [PATCH bpf-next v3 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
@ 2024-11-27 16:58 ` Kumar Kartikeya Dwivedi
  2024-11-28  2:39   ` Eduard Zingerman
  2024-11-27 16:58 ` [PATCH bpf-next v3 2/7] bpf: Refactor {acquire,release}_reference_state Kumar Kartikeya Dwivedi
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-27 16:58 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kernel-team

Currently, state for RCU read locks and preemption is in
bpf_verifier_state, while locks and pointer reference state remains in
bpf_func_state. There is no particular reason to keep the latter in
bpf_func_state. Additionally, it is copied into a new frame's state and
copied back to the caller frame's state everytime the verifier processes
a pseudo call instruction. This is a bit wasteful, given this state is
global for a given verification state / path.

Move all resource and reference related state in bpf_verifier_state
structure in this patch, in preparation for introducing new reference
state types in the future.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |  11 ++--
 kernel/bpf/log.c             |  11 ++--
 kernel/bpf/verifier.c        | 112 ++++++++++++++++-------------------
 3 files changed, 64 insertions(+), 70 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index f4290c179bee..af64b5415df8 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -315,9 +315,6 @@ struct bpf_func_state {
 	u32 callback_depth;
 
 	/* The following fields should be last. See copy_func_state() */
-	int acquired_refs;
-	int active_locks;
-	struct bpf_reference_state *refs;
 	/* The state of the stack. Each element of the array describes BPF_REG_SIZE
 	 * (i.e. 8) bytes worth of stack memory.
 	 * stack[0] represents bytes [*(r10-8)..*(r10-1)]
@@ -419,9 +416,13 @@ struct bpf_verifier_state {
 	u32 insn_idx;
 	u32 curframe;
 
-	bool speculative;
+	struct bpf_reference_state *refs;
+	u32 acquired_refs;
+	u32 active_locks;
+	u32 active_preempt_locks;
 	bool active_rcu_lock;
-	u32 active_preempt_lock;
+
+	bool speculative;
 	/* If this state was ever pointed-to by other state's loop_entry field
 	 * this flag would be set to true. Used to avoid freeing such states
 	 * while they are still in use.
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index 4a858fdb6476..8b52e5b7504c 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -756,6 +756,7 @@ static void print_reg_state(struct bpf_verifier_env *env,
 void print_verifier_state(struct bpf_verifier_env *env, const struct bpf_func_state *state,
 			  bool print_all)
 {
+	struct bpf_verifier_state *vstate = env->cur_state;
 	const struct bpf_reg_state *reg;
 	int i;
 
@@ -843,11 +844,11 @@ void print_verifier_state(struct bpf_verifier_env *env, const struct bpf_func_st
 			break;
 		}
 	}
-	if (state->acquired_refs && state->refs[0].id) {
-		verbose(env, " refs=%d", state->refs[0].id);
-		for (i = 1; i < state->acquired_refs; i++)
-			if (state->refs[i].id)
-				verbose(env, ",%d", state->refs[i].id);
+	if (vstate->acquired_refs && vstate->refs[0].id) {
+		verbose(env, " refs=%d", vstate->refs[0].id);
+		for (i = 1; i < vstate->acquired_refs; i++)
+			if (vstate->refs[i].id)
+				verbose(env, ",%d", vstate->refs[i].id);
 	}
 	if (state->in_callback_fn)
 		verbose(env, " cb");
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1c4ebb326785..f8313e95eb8e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1279,15 +1279,17 @@ static void *realloc_array(void *arr, size_t old_n, size_t new_n, size_t size)
 	return arr ? arr : ZERO_SIZE_PTR;
 }
 
-static int copy_reference_state(struct bpf_func_state *dst, const struct bpf_func_state *src)
+static int copy_reference_state(struct bpf_verifier_state *dst, const struct bpf_verifier_state *src)
 {
 	dst->refs = copy_array(dst->refs, src->refs, src->acquired_refs,
 			       sizeof(struct bpf_reference_state), GFP_KERNEL);
 	if (!dst->refs)
 		return -ENOMEM;
 
-	dst->active_locks = src->active_locks;
 	dst->acquired_refs = src->acquired_refs;
+	dst->active_locks = src->active_locks;
+	dst->active_preempt_locks = src->active_preempt_locks;
+	dst->active_rcu_lock = src->active_rcu_lock;
 	return 0;
 }
 
@@ -1304,7 +1306,7 @@ static int copy_stack_state(struct bpf_func_state *dst, const struct bpf_func_st
 	return 0;
 }
 
-static int resize_reference_state(struct bpf_func_state *state, size_t n)
+static int resize_reference_state(struct bpf_verifier_state *state, size_t n)
 {
 	state->refs = realloc_array(state->refs, state->acquired_refs, n,
 				    sizeof(struct bpf_reference_state));
@@ -1349,7 +1351,7 @@ static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state
  */
 static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx)
 {
-	struct bpf_func_state *state = cur_func(env);
+	struct bpf_verifier_state *state = env->cur_state;
 	int new_ofs = state->acquired_refs;
 	int id, err;
 
@@ -1367,7 +1369,7 @@ static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx)
 static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum ref_state_type type,
 			      int id, void *ptr)
 {
-	struct bpf_func_state *state = cur_func(env);
+	struct bpf_verifier_state *state = env->cur_state;
 	int new_ofs = state->acquired_refs;
 	int err;
 
@@ -1384,7 +1386,7 @@ static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum r
 }
 
 /* release function corresponding to acquire_reference_state(). Idempotent. */
-static int release_reference_state(struct bpf_func_state *state, int ptr_id)
+static int release_reference_state(struct bpf_verifier_state *state, int ptr_id)
 {
 	int i, last_idx;
 
@@ -1404,7 +1406,7 @@ static int release_reference_state(struct bpf_func_state *state, int ptr_id)
 	return -EINVAL;
 }
 
-static int release_lock_state(struct bpf_func_state *state, int type, int id, void *ptr)
+static int release_lock_state(struct bpf_verifier_state *state, int type, int id, void *ptr)
 {
 	int i, last_idx;
 
@@ -1425,10 +1427,9 @@ static int release_lock_state(struct bpf_func_state *state, int type, int id, vo
 	return -EINVAL;
 }
 
-static struct bpf_reference_state *find_lock_state(struct bpf_verifier_env *env, enum ref_state_type type,
+static struct bpf_reference_state *find_lock_state(struct bpf_verifier_state *state, enum ref_state_type type,
 						   int id, void *ptr)
 {
-	struct bpf_func_state *state = cur_func(env);
 	int i;
 
 	for (i = 0; i < state->acquired_refs; i++) {
@@ -1447,7 +1448,6 @@ static void free_func_state(struct bpf_func_state *state)
 {
 	if (!state)
 		return;
-	kfree(state->refs);
 	kfree(state->stack);
 	kfree(state);
 }
@@ -1461,6 +1461,7 @@ static void free_verifier_state(struct bpf_verifier_state *state,
 		free_func_state(state->frame[i]);
 		state->frame[i] = NULL;
 	}
+	kfree(state->refs);
 	if (free_self)
 		kfree(state);
 }
@@ -1471,12 +1472,7 @@ static void free_verifier_state(struct bpf_verifier_state *state,
 static int copy_func_state(struct bpf_func_state *dst,
 			   const struct bpf_func_state *src)
 {
-	int err;
-
-	memcpy(dst, src, offsetof(struct bpf_func_state, acquired_refs));
-	err = copy_reference_state(dst, src);
-	if (err)
-		return err;
+	memcpy(dst, src, offsetof(struct bpf_func_state, stack));
 	return copy_stack_state(dst, src);
 }
 
@@ -1493,9 +1489,10 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 		free_func_state(dst_state->frame[i]);
 		dst_state->frame[i] = NULL;
 	}
+	err = copy_reference_state(dst_state, src);
+	if (err)
+		return err;
 	dst_state->speculative = src->speculative;
-	dst_state->active_rcu_lock = src->active_rcu_lock;
-	dst_state->active_preempt_lock = src->active_preempt_lock;
 	dst_state->in_sleepable = src->in_sleepable;
 	dst_state->curframe = src->curframe;
 	dst_state->branches = src->branches;
@@ -5496,7 +5493,7 @@ static bool in_sleepable(struct bpf_verifier_env *env)
 static bool in_rcu_cs(struct bpf_verifier_env *env)
 {
 	return env->cur_state->active_rcu_lock ||
-	       cur_func(env)->active_locks ||
+	       env->cur_state->active_locks ||
 	       !in_sleepable(env);
 }
 
@@ -7850,15 +7847,15 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg
  * Since only one bpf_spin_lock is allowed the checks are simpler than
  * reg_is_refcounted() logic. The verifier needs to remember only
  * one spin_lock instead of array of acquired_refs.
- * cur_func(env)->active_locks remembers which map value element or allocated
+ * env->cur_state->active_locks remembers which map value element or allocated
  * object got locked and clears it after bpf_spin_unlock.
  */
 static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			     bool is_lock)
 {
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+	struct bpf_verifier_state *cur = env->cur_state;
 	bool is_const = tnum_is_const(reg->var_off);
-	struct bpf_func_state *cur = cur_func(env);
 	u64 val = reg->var_off.value;
 	struct bpf_map *map = NULL;
 	struct btf *btf = NULL;
@@ -7925,7 +7922,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			return -EINVAL;
 		}
 
-		if (release_lock_state(cur_func(env), REF_TYPE_LOCK, reg->id, ptr)) {
+		if (release_lock_state(env->cur_state, REF_TYPE_LOCK, reg->id, ptr)) {
 			verbose(env, "bpf_spin_unlock of different lock\n");
 			return -EINVAL;
 		}
@@ -9679,7 +9676,7 @@ static int release_reference(struct bpf_verifier_env *env,
 	struct bpf_reg_state *reg;
 	int err;
 
-	err = release_reference_state(cur_func(env), ref_obj_id);
+	err = release_reference_state(env->cur_state, ref_obj_id);
 	if (err)
 		return err;
 
@@ -9757,9 +9754,7 @@ static int setup_func_entry(struct bpf_verifier_env *env, int subprog, int calls
 			callsite,
 			state->curframe + 1 /* frameno within this callchain */,
 			subprog /* subprog number within this prog */);
-	/* Transfer references to the callee */
-	err = copy_reference_state(callee, caller);
-	err = err ?: set_callee_state_cb(env, caller, callee, callsite);
+	err = set_callee_state_cb(env, caller, callee, callsite);
 	if (err)
 		goto err_out;
 
@@ -9992,14 +9987,14 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		const char *sub_name = subprog_name(env, subprog);
 
 		/* Only global subprogs cannot be called with a lock held. */
-		if (cur_func(env)->active_locks) {
+		if (env->cur_state->active_locks) {
 			verbose(env, "global function calls are not allowed while holding a lock,\n"
 				     "use static function instead\n");
 			return -EINVAL;
 		}
 
 		/* Only global subprogs cannot be called with preemption disabled. */
-		if (env->cur_state->active_preempt_lock) {
+		if (env->cur_state->active_preempt_locks) {
 			verbose(env, "global function calls are not allowed with preemption disabled,\n"
 				     "use static function instead\n");
 			return -EINVAL;
@@ -10333,11 +10328,6 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 		caller->regs[BPF_REG_0] = *r0;
 	}
 
-	/* Transfer references to the caller */
-	err = copy_reference_state(caller, callee);
-	if (err)
-		return err;
-
 	/* for callbacks like bpf_loop or bpf_for_each_map_elem go back to callsite,
 	 * there function call logic would reschedule callback visit. If iteration
 	 * converges is_state_visited() would prune that visit eventually.
@@ -10502,11 +10492,11 @@ record_func_key(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
 
 static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exit)
 {
-	struct bpf_func_state *state = cur_func(env);
+	struct bpf_verifier_state *state = env->cur_state;
 	bool refs_lingering = false;
 	int i;
 
-	if (!exception_exit && state->frameno)
+	if (!exception_exit && cur_func(env)->frameno)
 		return 0;
 
 	for (i = 0; i < state->acquired_refs; i++) {
@@ -10523,7 +10513,7 @@ static int check_resource_leak(struct bpf_verifier_env *env, bool exception_exit
 {
 	int err;
 
-	if (check_lock && cur_func(env)->active_locks) {
+	if (check_lock && env->cur_state->active_locks) {
 		verbose(env, "%s cannot be used inside bpf_spin_lock-ed region\n", prefix);
 		return -EINVAL;
 	}
@@ -10539,7 +10529,7 @@ static int check_resource_leak(struct bpf_verifier_env *env, bool exception_exit
 		return -EINVAL;
 	}
 
-	if (check_lock && env->cur_state->active_preempt_lock) {
+	if (check_lock && env->cur_state->active_preempt_locks) {
 		verbose(env, "%s cannot be used inside bpf_preempt_disable-ed region\n", prefix);
 		return -EINVAL;
 	}
@@ -10727,7 +10717,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			env->insn_aux_data[insn_idx].storage_get_func_atomic = true;
 	}
 
-	if (env->cur_state->active_preempt_lock) {
+	if (env->cur_state->active_preempt_locks) {
 		if (fn->might_sleep) {
 			verbose(env, "sleepable helper %s#%d in non-preemptible region\n",
 				func_id_name(func_id), func_id);
@@ -10784,7 +10774,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			struct bpf_func_state *state;
 			struct bpf_reg_state *reg;
 
-			err = release_reference_state(cur_func(env), ref_obj_id);
+			err = release_reference_state(env->cur_state, ref_obj_id);
 			if (!err) {
 				bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
 					if (reg->ref_obj_id == ref_obj_id) {
@@ -11746,7 +11736,7 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
 {
 	struct btf_record *rec = reg_btf_record(reg);
 
-	if (!cur_func(env)->active_locks) {
+	if (!env->cur_state->active_locks) {
 		verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
 		return -EFAULT;
 	}
@@ -11765,12 +11755,11 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
 
 static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_obj_id)
 {
-	struct bpf_func_state *state, *unused;
+	struct bpf_verifier_state *state = env->cur_state;
+	struct bpf_func_state *unused;
 	struct bpf_reg_state *reg;
 	int i;
 
-	state = cur_func(env);
-
 	if (!ref_obj_id) {
 		verbose(env, "verifier internal error: ref_obj_id is zero for "
 			     "owning -> non-owning conversion\n");
@@ -11860,9 +11849,9 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_
 	}
 	id = reg->id;
 
-	if (!cur_func(env)->active_locks)
+	if (!env->cur_state->active_locks)
 		return -EINVAL;
-	s = find_lock_state(env, REF_TYPE_LOCK, id, ptr);
+	s = find_lock_state(env->cur_state, REF_TYPE_LOCK, id, ptr);
 	if (!s) {
 		verbose(env, "held lock and object are not in the same allocation\n");
 		return -EINVAL;
@@ -12789,17 +12778,17 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		return -EINVAL;
 	}
 
-	if (env->cur_state->active_preempt_lock) {
+	if (env->cur_state->active_preempt_locks) {
 		if (preempt_disable) {
-			env->cur_state->active_preempt_lock++;
+			env->cur_state->active_preempt_locks++;
 		} else if (preempt_enable) {
-			env->cur_state->active_preempt_lock--;
+			env->cur_state->active_preempt_locks--;
 		} else if (sleepable) {
 			verbose(env, "kernel func %s is sleepable within non-preemptible region\n", func_name);
 			return -EACCES;
 		}
 	} else if (preempt_disable) {
-		env->cur_state->active_preempt_lock++;
+		env->cur_state->active_preempt_locks++;
 	} else if (preempt_enable) {
 		verbose(env, "unmatched attempt to enable preemption (kernel function %s)\n", func_name);
 		return -EINVAL;
@@ -15398,7 +15387,7 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
 		 * No one could have freed the reference state before
 		 * doing the NULL check.
 		 */
-		WARN_ON_ONCE(release_reference_state(state, id));
+		WARN_ON_ONCE(release_reference_state(vstate, id));
 
 	bpf_for_each_reg_in_vstate(vstate, state, reg, ({
 		mark_ptr_or_null_reg(state, reg, id, is_null);
@@ -17750,7 +17739,7 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
 	return true;
 }
 
-static bool refsafe(struct bpf_func_state *old, struct bpf_func_state *cur,
+static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *cur,
 		    struct bpf_idmap *idmap)
 {
 	int i;
@@ -17758,6 +17747,15 @@ static bool refsafe(struct bpf_func_state *old, struct bpf_func_state *cur,
 	if (old->acquired_refs != cur->acquired_refs)
 		return false;
 
+	if (old->active_locks != cur->active_locks)
+		return false;
+
+	if (old->active_preempt_locks != cur->active_preempt_locks)
+		return false;
+
+	if (old->active_rcu_lock != cur->active_rcu_lock)
+		return false;
+
 	for (i = 0; i < old->acquired_refs; i++) {
 		if (!check_ids(old->refs[i].id, cur->refs[i].id, idmap) ||
 		    old->refs[i].type != cur->refs[i].type)
@@ -17820,9 +17818,6 @@ static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_stat
 	if (!stacksafe(env, old, cur, &env->idmap_scratch, exact))
 		return false;
 
-	if (!refsafe(old, cur, &env->idmap_scratch))
-		return false;
-
 	return true;
 }
 
@@ -17850,13 +17845,10 @@ static bool states_equal(struct bpf_verifier_env *env,
 	if (old->speculative && !cur->speculative)
 		return false;
 
-	if (old->active_rcu_lock != cur->active_rcu_lock)
-		return false;
-
-	if (old->active_preempt_lock != cur->active_preempt_lock)
+	if (old->in_sleepable != cur->in_sleepable)
 		return false;
 
-	if (old->in_sleepable != cur->in_sleepable)
+	if (!refsafe(old, cur, &env->idmap_scratch))
 		return false;
 
 	/* for states to be equal callsites have to be the same
@@ -18751,7 +18743,7 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
-				if (cur_func(env)->active_locks) {
+				if (env->cur_state->active_locks) {
 					if ((insn->src_reg == BPF_REG_0 && insn->imm != BPF_FUNC_spin_unlock) ||
 					    (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
 					     (insn->off != 0 || !is_bpf_graph_api_kfunc(insn->imm)))) {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH bpf-next v3 2/7] bpf: Refactor {acquire,release}_reference_state
  2024-11-27 16:58 [PATCH bpf-next v3 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
  2024-11-27 16:58 ` [PATCH bpf-next v3 1/7] bpf: Consolidate locks and reference state in verifier state Kumar Kartikeya Dwivedi
@ 2024-11-27 16:58 ` Kumar Kartikeya Dwivedi
  2024-11-28  4:13   ` Eduard Zingerman
  2024-11-27 16:58 ` [PATCH bpf-next v3 3/7] bpf: Refactor mark_{dynptr,iter}_read Kumar Kartikeya Dwivedi
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-27 16:58 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kernel-team

In preparation for introducing support for more reference types which
have to add and remove reference state, refactor the
acquire_reference_state and release_reference_state functions to share
common logic.

The acquire_reference_state function simply handles growing the acquired
refs and returning the pointer to the new uninitialized element, which
can be filled in by the caller.

The release_reference_state function simply erases a reference state
entry in the acquired_refs array and shrinks it. The callers are
responsible for finding the suitable element by matching on various
fields of the reference state and requesting deletion through this
function. It is not supposed to be called directly.

Existing callers of release_reference_state were using it to find and
remove state for a given ref_obj_id without scrubbing the associated
registers in the verifier state. Introduce release_reference_nomark to
provide this functionality and convert callers. We now use this new
release_reference_nomark function within release_reference as well.
It needs to operate on a verifier state instead of taking verifier env
as mark_ptr_or_null_regs requires operating on verifier state of the
two branches of a NULL condition check, therefore env->cur_state cannot
be used directly.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 113 +++++++++++++++++++++++-------------------
 1 file changed, 63 insertions(+), 50 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f8313e95eb8e..474cca3e8f66 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -196,7 +196,8 @@ struct bpf_verifier_stack_elem {
 
 #define BPF_PRIV_STACK_MIN_SIZE		64
 
-static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
+static int acquire_reference(struct bpf_verifier_env *env, int insn_idx);
+static int release_reference_nomark(struct bpf_verifier_state *state, int ref_obj_id);
 static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
 static void invalidate_non_owning_refs(struct bpf_verifier_env *env);
 static bool in_rbtree_lock_required_cb(struct bpf_verifier_env *env);
@@ -771,7 +772,7 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
 		if (clone_ref_obj_id)
 			id = clone_ref_obj_id;
 		else
-			id = acquire_reference_state(env, insn_idx);
+			id = acquire_reference(env, insn_idx);
 
 		if (id < 0)
 			return id;
@@ -1033,7 +1034,7 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
 	if (spi < 0)
 		return spi;
 
-	id = acquire_reference_state(env, insn_idx);
+	id = acquire_reference(env, insn_idx);
 	if (id < 0)
 		return id;
 
@@ -1349,77 +1350,69 @@ static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state
  * On success, returns a valid pointer id to associate with the register
  * On failure, returns a negative errno.
  */
-static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx)
+static struct bpf_reference_state *acquire_reference_state(struct bpf_verifier_env *env, int insn_idx, bool gen_id)
 {
 	struct bpf_verifier_state *state = env->cur_state;
 	int new_ofs = state->acquired_refs;
-	int id, err;
+	int err;
 
 	err = resize_reference_state(state, state->acquired_refs + 1);
 	if (err)
-		return err;
-	id = ++env->id_gen;
-	state->refs[new_ofs].type = REF_TYPE_PTR;
-	state->refs[new_ofs].id = id;
+		return NULL;
+	if (gen_id)
+		state->refs[new_ofs].id = ++env->id_gen;
 	state->refs[new_ofs].insn_idx = insn_idx;
 
-	return id;
+	return &state->refs[new_ofs];
+}
+
+static int acquire_reference(struct bpf_verifier_env *env, int insn_idx)
+{
+	struct bpf_reference_state *s;
+
+	s = acquire_reference_state(env, insn_idx, true);
+	if (!s)
+		return -ENOMEM;
+	s->type = REF_TYPE_PTR;
+	return s->id;
 }
 
 static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum ref_state_type type,
 			      int id, void *ptr)
 {
 	struct bpf_verifier_state *state = env->cur_state;
-	int new_ofs = state->acquired_refs;
-	int err;
+	struct bpf_reference_state *s;
 
-	err = resize_reference_state(state, state->acquired_refs + 1);
-	if (err)
-		return err;
-	state->refs[new_ofs].type = type;
-	state->refs[new_ofs].id = id;
-	state->refs[new_ofs].insn_idx = insn_idx;
-	state->refs[new_ofs].ptr = ptr;
+	s = acquire_reference_state(env, insn_idx, false);
+	s->type = type;
+	s->id = id;
+	s->ptr = ptr;
 
 	state->active_locks++;
 	return 0;
 }
 
-/* release function corresponding to acquire_reference_state(). Idempotent. */
-static int release_reference_state(struct bpf_verifier_state *state, int ptr_id)
+static void release_reference_state(struct bpf_verifier_state *state, int idx)
 {
-	int i, last_idx;
+	int last_idx;
 
 	last_idx = state->acquired_refs - 1;
-	for (i = 0; i < state->acquired_refs; i++) {
-		if (state->refs[i].type != REF_TYPE_PTR)
-			continue;
-		if (state->refs[i].id == ptr_id) {
-			if (last_idx && i != last_idx)
-				memcpy(&state->refs[i], &state->refs[last_idx],
-				       sizeof(*state->refs));
-			memset(&state->refs[last_idx], 0, sizeof(*state->refs));
-			state->acquired_refs--;
-			return 0;
-		}
-	}
-	return -EINVAL;
+	if (last_idx && idx != last_idx)
+		memcpy(&state->refs[idx], &state->refs[last_idx], sizeof(*state->refs));
+	memset(&state->refs[last_idx], 0, sizeof(*state->refs));
+	state->acquired_refs--;
+	return;
 }
 
 static int release_lock_state(struct bpf_verifier_state *state, int type, int id, void *ptr)
 {
-	int i, last_idx;
+	int i;
 
-	last_idx = state->acquired_refs - 1;
 	for (i = 0; i < state->acquired_refs; i++) {
 		if (state->refs[i].type != type)
 			continue;
 		if (state->refs[i].id == id && state->refs[i].ptr == ptr) {
-			if (last_idx && i != last_idx)
-				memcpy(&state->refs[i], &state->refs[last_idx],
-				       sizeof(*state->refs));
-			memset(&state->refs[last_idx], 0, sizeof(*state->refs));
-			state->acquired_refs--;
+			release_reference_state(state, i);
 			state->active_locks--;
 			return 0;
 		}
@@ -9666,21 +9659,41 @@ static void mark_pkt_end(struct bpf_verifier_state *vstate, int regn, bool range
 		reg->range = AT_PKT_END;
 }
 
+static int release_reference_nomark(struct bpf_verifier_state *state, int ref_obj_id)
+{
+	int i;
+
+	for (i = 0; i < state->acquired_refs; i++) {
+		if (state->refs[i].type != REF_TYPE_PTR)
+			continue;
+		if (state->refs[i].id == ref_obj_id) {
+			release_reference_state(state, i);
+			return 0;
+		}
+	}
+	return -EINVAL;
+}
+
 /* The pointer with the specified id has released its reference to kernel
  * resources. Identify all copies of the same pointer and clear the reference.
+ *
+ * This is the release function corresponding to acquire_reference(). Idempotent.
+ * The 'mark' boolean is used to optionally skip scrubbing registers matching
+ * the ref_obj_id, in case they need to be switched to some other type instead
+ * of havoc scalar value.
  */
-static int release_reference(struct bpf_verifier_env *env,
-			     int ref_obj_id)
+static int release_reference(struct bpf_verifier_env *env, int ref_obj_id)
 {
+	struct bpf_verifier_state *vstate = env->cur_state;
 	struct bpf_func_state *state;
 	struct bpf_reg_state *reg;
 	int err;
 
-	err = release_reference_state(env->cur_state, ref_obj_id);
+	err = release_reference_nomark(vstate, ref_obj_id);
 	if (err)
 		return err;
 
-	bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
+	bpf_for_each_reg_in_vstate(vstate, state, reg, ({
 		if (reg->ref_obj_id == ref_obj_id)
 			mark_reg_invalid(env, reg);
 	}));
@@ -10774,7 +10787,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			struct bpf_func_state *state;
 			struct bpf_reg_state *reg;
 
-			err = release_reference_state(env->cur_state, ref_obj_id);
+			err = release_reference_nomark(env->cur_state, ref_obj_id);
 			if (!err) {
 				bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
 					if (reg->ref_obj_id == ref_obj_id) {
@@ -11107,7 +11120,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		/* For release_reference() */
 		regs[BPF_REG_0].ref_obj_id = meta.ref_obj_id;
 	} else if (is_acquire_function(func_id, meta.map_ptr)) {
-		int id = acquire_reference_state(env, insn_idx);
+		int id = acquire_reference(env, insn_idx);
 
 		if (id < 0)
 			return id;
@@ -13087,7 +13100,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		}
 		mark_btf_func_reg_size(env, BPF_REG_0, sizeof(void *));
 		if (is_kfunc_acquire(&meta)) {
-			int id = acquire_reference_state(env, insn_idx);
+			int id = acquire_reference(env, insn_idx);
 
 			if (id < 0)
 				return id;
@@ -15387,7 +15400,7 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
 		 * No one could have freed the reference state before
 		 * doing the NULL check.
 		 */
-		WARN_ON_ONCE(release_reference_state(vstate, id));
+		WARN_ON_ONCE(release_reference_nomark(vstate, id));
 
 	bpf_for_each_reg_in_vstate(vstate, state, reg, ({
 		mark_ptr_or_null_reg(state, reg, id, is_null);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH bpf-next v3 3/7] bpf: Refactor mark_{dynptr,iter}_read
  2024-11-27 16:58 [PATCH bpf-next v3 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
  2024-11-27 16:58 ` [PATCH bpf-next v3 1/7] bpf: Consolidate locks and reference state in verifier state Kumar Kartikeya Dwivedi
  2024-11-27 16:58 ` [PATCH bpf-next v3 2/7] bpf: Refactor {acquire,release}_reference_state Kumar Kartikeya Dwivedi
@ 2024-11-27 16:58 ` Kumar Kartikeya Dwivedi
  2024-11-27 16:58 ` [PATCH bpf-next v3 4/7] bpf: Introduce support for bpf_local_irq_{save,restore} Kumar Kartikeya Dwivedi
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-27 16:58 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Eduard Zingerman, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, kernel-team

There is possibility of sharing code between mark_dynptr_read and
mark_iter_read for updating liveness information of their stack slots.
Consolidate common logic into mark_stack_slot_obj_read function in
preparation for the next patch which needs the same logic for its own
stack slots.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 43 +++++++++++++++++++++----------------------
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 474cca3e8f66..be2365a9794a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3192,10 +3192,27 @@ static int mark_reg_read(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static int mark_dynptr_read(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+static int mark_stack_slot_obj_read(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				    int spi, int nr_slots)
 {
 	struct bpf_func_state *state = func(env, reg);
-	int spi, ret;
+	int err, i;
+
+	for (i = 0; i < nr_slots; i++) {
+		struct bpf_reg_state *st = &state->stack[spi - i].spilled_ptr;
+
+		err = mark_reg_read(env, st, st->parent, REG_LIVE_READ64);
+		if (err)
+			return err;
+
+		mark_stack_slot_scratched(env, spi - i);
+	}
+	return 0;
+}
+
+static int mark_dynptr_read(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	int spi;
 
 	/* For CONST_PTR_TO_DYNPTR, it must have already been done by
 	 * check_reg_arg in check_helper_call and mark_btf_func_reg_size in
@@ -3210,31 +3227,13 @@ static int mark_dynptr_read(struct bpf_verifier_env *env, struct bpf_reg_state *
 	 * bounds and spi is the first dynptr slot. Simply mark stack slot as
 	 * read.
 	 */
-	ret = mark_reg_read(env, &state->stack[spi].spilled_ptr,
-			    state->stack[spi].spilled_ptr.parent, REG_LIVE_READ64);
-	if (ret)
-		return ret;
-	return mark_reg_read(env, &state->stack[spi - 1].spilled_ptr,
-			     state->stack[spi - 1].spilled_ptr.parent, REG_LIVE_READ64);
+	return mark_stack_slot_obj_read(env, reg, spi, BPF_DYNPTR_NR_SLOTS);
 }
 
 static int mark_iter_read(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
 			  int spi, int nr_slots)
 {
-	struct bpf_func_state *state = func(env, reg);
-	int err, i;
-
-	for (i = 0; i < nr_slots; i++) {
-		struct bpf_reg_state *st = &state->stack[spi - i].spilled_ptr;
-
-		err = mark_reg_read(env, st, st->parent, REG_LIVE_READ64);
-		if (err)
-			return err;
-
-		mark_stack_slot_scratched(env, spi - i);
-	}
-
-	return 0;
+	return mark_stack_slot_obj_read(env, reg, spi, nr_slots);
 }
 
 /* This function is supposed to be used by the following 32-bit optimization
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH bpf-next v3 4/7] bpf: Introduce support for bpf_local_irq_{save,restore}
  2024-11-27 16:58 [PATCH bpf-next v3 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2024-11-27 16:58 ` [PATCH bpf-next v3 3/7] bpf: Refactor mark_{dynptr,iter}_read Kumar Kartikeya Dwivedi
@ 2024-11-27 16:58 ` Kumar Kartikeya Dwivedi
  2024-11-28  4:31   ` Eduard Zingerman
  2024-11-27 16:58 ` [PATCH bpf-next v3 5/7] bpf: Improve verifier log for resource leak on exit Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-27 16:58 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Eduard Zingerman, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, kernel-team

Teach the verifier about IRQ-disabled sections through the introduction
of two new kfuncs, bpf_local_irq_save, to save IRQ state and disable
them, and bpf_local_irq_restore, to restore IRQ state and enable them
back again.

For the purposes of tracking the saved IRQ state, the verifier is taught
about a new special object on the stack of type STACK_IRQ_FLAG. This is
a 8 byte value which saves the IRQ flags which are to be passed back to
the IRQ restore kfunc.

Renumber the enums for REF_TYPE_* to simplify the check in
find_lock_state, filtering out non-lock types as they grow will become
cumbersome and is unecessary.

To track a dynamic number of IRQ-disabled regions and their associated
saved states, a new resource type RES_TYPE_IRQ is introduced, which its
state management functions: acquire_irq_state and release_irq_state,
taking advantage of the refactoring and clean ups made in earlier
commits.

One notable requirement of the kernel's IRQ save and restore API is that
they cannot happen out of order. For this purpose, when releasing reference
we keep track of the prev_id we saw with REF_TYPE_IRQ. Since reference
states are inserted in increasing order of the index, this is used to
remember the ordering of acquisitions of IRQ saved states, so that we
maintain a logical stack in acquisition order of resource identities,
and can enforce LIFO ordering when restoring IRQ state. The top of the
stack is maintained using bpf_verifier_state's active_irq_id.

The logic to detect initialized and unitialized irq flag slots, marking
and unmarking is similar to how it's done for iterators. No additional
checks are needed in refsafe for REF_TYPE_IRQ, apart from the usual
check_id satisfiability check on the ref[i].id. We have to perform the
same check_ids check on state->active_irq_id as well.

The kfuncs themselves are plain wrappers over local_irq_save and
local_irq_restore macros.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |   8 +-
 kernel/bpf/helpers.c         |  17 +++
 kernel/bpf/log.c             |   1 +
 kernel/bpf/verifier.c        | 279 ++++++++++++++++++++++++++++++++++-
 4 files changed, 302 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index af64b5415df8..3da7ae6c7bba 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -233,6 +233,7 @@ enum bpf_stack_slot_type {
 	 */
 	STACK_DYNPTR,
 	STACK_ITER,
+	STACK_IRQ_FLAG,
 };
 
 #define BPF_REG_SIZE 8	/* size of eBPF register in bytes */
@@ -254,8 +255,10 @@ struct bpf_reference_state {
 	 * default to pointer reference on zero initialization of a state.
 	 */
 	enum ref_state_type {
-		REF_TYPE_PTR = 0,
-		REF_TYPE_LOCK,
+		REF_TYPE_PTR	= 1,
+		REF_TYPE_IRQ	= 2,
+
+		REF_TYPE_LOCK	= 3,
 	} type;
 	/* Track each reference created with a unique id, even if the same
 	 * instruction creates the reference multiple times (eg, via CALL).
@@ -420,6 +423,7 @@ struct bpf_verifier_state {
 	u32 acquired_refs;
 	u32 active_locks;
 	u32 active_preempt_locks;
+	u32 active_irq_id;
 	bool active_rcu_lock;
 
 	bool speculative;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 751c150f9e1c..532ea74d4850 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -3057,6 +3057,21 @@ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user
 	return ret + 1;
 }
 
+/* Keep unsinged long in prototype so that kfunc is usable when emitted to
+ * vmlinux.h in BPF programs directly, but note that while in BPF prog, the
+ * unsigned long always points to 8-byte region on stack, the kernel may only
+ * read and write the 4-bytes on 32-bit.
+ */
+__bpf_kfunc void bpf_local_irq_save(unsigned long *flags__irq_flag)
+{
+	local_irq_save(*flags__irq_flag);
+}
+
+__bpf_kfunc void bpf_local_irq_restore(unsigned long *flags__irq_flag)
+{
+	local_irq_restore(*flags__irq_flag);
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(generic_btf_ids)
@@ -3149,6 +3164,8 @@ BTF_ID_FLAGS(func, bpf_get_kmem_cache)
 BTF_ID_FLAGS(func, bpf_iter_kmem_cache_new, KF_ITER_NEW | KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_iter_kmem_cache_next, KF_ITER_NEXT | KF_RET_NULL | KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_iter_kmem_cache_destroy, KF_ITER_DESTROY | KF_SLEEPABLE)
+BTF_ID_FLAGS(func, bpf_local_irq_save)
+BTF_ID_FLAGS(func, bpf_local_irq_restore)
 BTF_KFUNCS_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index 8b52e5b7504c..434fc320ba1d 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -537,6 +537,7 @@ static char slot_type_char[] = {
 	[STACK_ZERO]	= '0',
 	[STACK_DYNPTR]	= 'd',
 	[STACK_ITER]	= 'i',
+	[STACK_IRQ_FLAG] = 'f'
 };
 
 static void print_liveness(struct bpf_verifier_env *env,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index be2365a9794a..c6b40da49835 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -661,6 +661,11 @@ static int iter_get_spi(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
 	return stack_slot_obj_get_spi(env, reg, "iter", nr_slots);
 }
 
+static int irq_flag_get_spi(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	return stack_slot_obj_get_spi(env, reg, "irq_flag", 1);
+}
+
 static enum bpf_dynptr_type arg_to_dynptr_type(enum bpf_arg_type arg_type)
 {
 	switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
@@ -1156,10 +1161,126 @@ static int is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_s
 	return 0;
 }
 
+static int acquire_irq_state(struct bpf_verifier_env *env, int insn_idx);
+static int release_irq_state(struct bpf_verifier_state *state, int id);
+
+static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
+				     struct bpf_kfunc_call_arg_meta *meta,
+				     struct bpf_reg_state *reg, int insn_idx)
+{
+	struct bpf_func_state *state = func(env, reg);
+	struct bpf_stack_state *slot;
+	struct bpf_reg_state *st;
+	int spi, i, id;
+
+	spi = irq_flag_get_spi(env, reg);
+	if (spi < 0)
+		return spi;
+
+	id = acquire_irq_state(env, insn_idx);
+	if (id < 0)
+		return id;
+
+	slot = &state->stack[spi];
+	st = &slot->spilled_ptr;
+
+	__mark_reg_known_zero(st);
+	st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
+	st->live |= REG_LIVE_WRITTEN;
+	st->ref_obj_id = id;
+
+	for (i = 0; i < BPF_REG_SIZE; i++)
+		slot->slot_type[i] = STACK_IRQ_FLAG;
+
+	mark_stack_slot_scratched(env, spi);
+	return 0;
+}
+
+static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	struct bpf_stack_state *slot;
+	struct bpf_reg_state *st;
+	int spi, i, err;
+
+	spi = irq_flag_get_spi(env, reg);
+	if (spi < 0)
+		return spi;
+
+	slot = &state->stack[spi];
+	st = &slot->spilled_ptr;
+
+	err = release_irq_state(env->cur_state, st->ref_obj_id);
+	WARN_ON_ONCE(err && err != -EACCES);
+	if (err) {
+		verbose(env, "cannot restore irq state out of order\n");
+		return err;
+	}
+
+	__mark_reg_not_init(env, st);
+
+	/* see unmark_stack_slots_dynptr() for why we need to set REG_LIVE_WRITTEN */
+	st->live |= REG_LIVE_WRITTEN;
+
+	for (i = 0; i < BPF_REG_SIZE; i++)
+		slot->slot_type[i] = STACK_INVALID;
+
+	mark_stack_slot_scratched(env, spi);
+	return 0;
+}
+
+static bool is_irq_flag_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	struct bpf_stack_state *slot;
+	int spi, i;
+
+	/* For -ERANGE (i.e. spi not falling into allocated stack slots), we
+	 * will do check_mem_access to check and update stack bounds later, so
+	 * return true for that case.
+	 */
+	spi = irq_flag_get_spi(env, reg);
+	if (spi == -ERANGE)
+		return true;
+	if (spi < 0)
+		return false;
+
+	slot = &state->stack[spi];
+
+	for (i = 0; i < BPF_REG_SIZE; i++)
+		if (slot->slot_type[i] == STACK_IRQ_FLAG)
+			return false;
+	return true;
+}
+
+static int is_irq_flag_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	struct bpf_stack_state *slot;
+	struct bpf_reg_state *st;
+	int spi, i;
+
+	spi = irq_flag_get_spi(env, reg);
+	if (spi < 0)
+		return -EINVAL;
+
+	slot = &state->stack[spi];
+	st = &slot->spilled_ptr;
+
+	if (!st->ref_obj_id)
+		return -EINVAL;
+
+	for (i = 0; i < BPF_REG_SIZE; i++)
+		if (slot->slot_type[i] != STACK_IRQ_FLAG)
+			return -EINVAL;
+	return 0;
+}
+
 /* Check if given stack slot is "special":
  *   - spilled register state (STACK_SPILL);
  *   - dynptr state (STACK_DYNPTR);
  *   - iter state (STACK_ITER).
+ *   - irq flag state (STACK_IRQ_FLAG)
  */
 static bool is_stack_slot_special(const struct bpf_stack_state *stack)
 {
@@ -1169,6 +1290,7 @@ static bool is_stack_slot_special(const struct bpf_stack_state *stack)
 	case STACK_SPILL:
 	case STACK_DYNPTR:
 	case STACK_ITER:
+	case STACK_IRQ_FLAG:
 		return true;
 	case STACK_INVALID:
 	case STACK_MISC:
@@ -1291,6 +1413,7 @@ static int copy_reference_state(struct bpf_verifier_state *dst, const struct bpf
 	dst->active_locks = src->active_locks;
 	dst->active_preempt_locks = src->active_preempt_locks;
 	dst->active_rcu_lock = src->active_rcu_lock;
+	dst->active_irq_id = src->active_irq_id;
 	return 0;
 }
 
@@ -1392,6 +1515,20 @@ static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum r
 	return 0;
 }
 
+static int acquire_irq_state(struct bpf_verifier_env *env, int insn_idx)
+{
+	struct bpf_verifier_state *state = env->cur_state;
+	struct bpf_reference_state *s;
+
+	s = acquire_reference_state(env, insn_idx, true);
+	if (!s)
+		return -ENOMEM;
+	s->type = REF_TYPE_IRQ;
+
+	state->active_irq_id = s->id;
+	return s->id;
+}
+
 static void release_reference_state(struct bpf_verifier_state *state, int idx)
 {
 	int last_idx;
@@ -1420,6 +1557,28 @@ static int release_lock_state(struct bpf_verifier_state *state, int type, int id
 	return -EINVAL;
 }
 
+static int release_irq_state(struct bpf_verifier_state *state, int id)
+{
+	u32 prev_id = 0;
+	int i;
+
+	if (id != state->active_irq_id)
+		return -EACCES;
+
+	for (i = 0; i < state->acquired_refs; i++) {
+		if (state->refs[i].type != REF_TYPE_IRQ)
+			continue;
+		if (state->refs[i].id == id) {
+			release_reference_state(state, i);
+			state->active_irq_id = prev_id;
+			return 0;
+		} else {
+			prev_id = state->refs[i].id;
+		}
+	}
+	return -EINVAL;
+}
+
 static struct bpf_reference_state *find_lock_state(struct bpf_verifier_state *state, enum ref_state_type type,
 						   int id, void *ptr)
 {
@@ -1428,7 +1587,7 @@ static struct bpf_reference_state *find_lock_state(struct bpf_verifier_state *st
 	for (i = 0; i < state->acquired_refs; i++) {
 		struct bpf_reference_state *s = &state->refs[i];
 
-		if (s->type == REF_TYPE_PTR || s->type != type)
+		if (s->type != type)
 			continue;
 
 		if (s->id == id && s->ptr == ptr)
@@ -3236,6 +3395,16 @@ static int mark_iter_read(struct bpf_verifier_env *env, struct bpf_reg_state *re
 	return mark_stack_slot_obj_read(env, reg, spi, nr_slots);
 }
 
+static int mark_irq_flag_read(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	int spi;
+
+	spi = irq_flag_get_spi(env, reg);
+	if (spi < 0)
+		return spi;
+	return mark_stack_slot_obj_read(env, reg, spi, 1);
+}
+
 /* This function is supposed to be used by the following 32-bit optimization
  * code only. It returns TRUE if the source or destination register operates
  * on 64-bit, otherwise return FALSE.
@@ -10012,6 +10181,12 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			return -EINVAL;
 		}
 
+		if (env->cur_state->active_irq_id) {
+			verbose(env, "global function calls are not allowed with IRQs disabled,\n"
+				     "use static function instead\n");
+			return -EINVAL;
+		}
+
 		if (err) {
 			verbose(env, "Caller passes invalid args into func#%d ('%s')\n",
 				subprog, sub_name);
@@ -10536,6 +10711,11 @@ static int check_resource_leak(struct bpf_verifier_env *env, bool exception_exit
 		return err;
 	}
 
+	if (check_lock && env->cur_state->active_irq_id) {
+		verbose(env, "%s cannot be used inside bpf_local_irq_save-ed region\n", prefix);
+		return -EINVAL;
+	}
+
 	if (check_lock && env->cur_state->active_rcu_lock) {
 		verbose(env, "%s cannot be used inside bpf_rcu_read_lock-ed region\n", prefix);
 		return -EINVAL;
@@ -10740,6 +10920,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			env->insn_aux_data[insn_idx].storage_get_func_atomic = true;
 	}
 
+	if (env->cur_state->active_irq_id) {
+		if (fn->might_sleep) {
+			verbose(env, "sleepable helper %s#%d in IRQ-disabled region\n",
+				func_id_name(func_id), func_id);
+			return -EINVAL;
+		}
+
+		if (in_sleepable(env) && is_storage_get_function(func_id))
+			env->insn_aux_data[insn_idx].storage_get_func_atomic = true;
+	}
+
 	meta.func_id = func_id;
 	/* check args */
 	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
@@ -11301,6 +11492,11 @@ static bool is_kfunc_arg_const_str(const struct btf *btf, const struct btf_param
 	return btf_param_match_suffix(btf, arg, "__str");
 }
 
+static bool is_kfunc_arg_irq_flag(const struct btf *btf, const struct btf_param *arg)
+{
+	return btf_param_match_suffix(btf, arg, "__irq_flag");
+}
+
 static bool is_kfunc_arg_scalar_with_name(const struct btf *btf,
 					  const struct btf_param *arg,
 					  const char *name)
@@ -11454,6 +11650,7 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_CONST_STR,
 	KF_ARG_PTR_TO_MAP,
 	KF_ARG_PTR_TO_WORKQUEUE,
+	KF_ARG_PTR_TO_IRQ_FLAG,
 };
 
 enum special_kfunc_type {
@@ -11485,6 +11682,8 @@ enum special_kfunc_type {
 	KF_bpf_iter_css_task_new,
 	KF_bpf_session_cookie,
 	KF_bpf_get_kmem_cache,
+	KF_bpf_local_irq_save,
+	KF_bpf_local_irq_restore,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -11551,6 +11750,8 @@ BTF_ID(func, bpf_session_cookie)
 BTF_ID_UNUSED
 #endif
 BTF_ID(func, bpf_get_kmem_cache)
+BTF_ID(func, bpf_local_irq_save)
+BTF_ID(func, bpf_local_irq_restore)
 
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -11641,6 +11842,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_wq(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_WORKQUEUE;
 
+	if (is_kfunc_arg_irq_flag(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_IRQ_FLAG;
+
 	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
 		if (!btf_type_is_struct(ref_t)) {
 			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -11744,6 +11948,54 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int process_irq_flag(struct bpf_verifier_env *env, int regno,
+			     struct bpf_kfunc_call_arg_meta *meta)
+{
+	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+	bool irq_save;
+	int err;
+
+	if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save]) {
+		irq_save = true;
+	} else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore]) {
+		irq_save = false;
+	} else {
+		verbose(env, "verifier internal error: unknown irq flags kfunc\n");
+		return -EFAULT;
+	}
+
+	if (irq_save) {
+		if (!is_irq_flag_reg_valid_uninit(env, reg)) {
+			verbose(env, "expected uninitialized irq flag as arg#%d\n", regno);
+			return -EINVAL;
+		}
+
+		err = check_mem_access(env, env->insn_idx, regno, 0, BPF_DW, BPF_WRITE, -1, false, false);
+		if (err)
+			return err;
+
+		err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx);
+		if (err)
+			return err;
+	} else {
+		err = is_irq_flag_reg_valid_init(env, reg);
+		if (err) {
+			verbose(env, "expected an initialized irq flag as arg#%d\n", regno);
+			return err;
+		}
+
+		err = mark_irq_flag_read(env, reg);
+		if (err)
+			return err;
+
+		err = unmark_stack_slot_irq_flag(env, reg);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+
 static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
 {
 	struct btf_record *rec = reg_btf_record(reg);
@@ -12332,6 +12584,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_REFCOUNTED_KPTR:
 		case KF_ARG_PTR_TO_CONST_STR:
 		case KF_ARG_PTR_TO_WORKQUEUE:
+		case KF_ARG_PTR_TO_IRQ_FLAG:
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -12626,6 +12879,15 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (ret < 0)
 				return ret;
 			break;
+		case KF_ARG_PTR_TO_IRQ_FLAG:
+			if (reg->type != PTR_TO_STACK) {
+				verbose(env, "arg#%d doesn't point to an irq flag on stack\n", i);
+				return -EINVAL;
+			}
+			ret = process_irq_flag(env, regno, meta);
+			if (ret < 0)
+				return ret;
+			break;
 		}
 	}
 
@@ -12806,6 +13068,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		return -EINVAL;
 	}
 
+	if (env->cur_state->active_irq_id && sleepable) {
+		verbose(env, "kernel func %s is sleepable within IRQ-disabled region\n", func_name);
+		return -EACCES;
+	}
+
 	/* In case of release function, we get register number of refcounted
 	 * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now.
 	 */
@@ -17739,6 +18006,12 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
 			    !check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap))
 				return false;
 			break;
+		case STACK_IRQ_FLAG:
+			old_reg = &old->stack[spi].spilled_ptr;
+			cur_reg = &cur->stack[spi].spilled_ptr;
+			if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap))
+				return false;
+			break;
 		case STACK_MISC:
 		case STACK_ZERO:
 		case STACK_INVALID:
@@ -17768,12 +18041,16 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
 	if (old->active_rcu_lock != cur->active_rcu_lock)
 		return false;
 
+	if (!check_ids(old->active_irq_id, cur->active_irq_id, idmap))
+		return false;
+
 	for (i = 0; i < old->acquired_refs; i++) {
 		if (!check_ids(old->refs[i].id, cur->refs[i].id, idmap) ||
 		    old->refs[i].type != cur->refs[i].type)
 			return false;
 		switch (old->refs[i].type) {
 		case REF_TYPE_PTR:
+		case REF_TYPE_IRQ:
 			break;
 		case REF_TYPE_LOCK:
 			if (old->refs[i].ptr != cur->refs[i].ptr)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH bpf-next v3 5/7] bpf: Improve verifier log for resource leak on exit
  2024-11-27 16:58 [PATCH bpf-next v3 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
                   ` (3 preceding siblings ...)
  2024-11-27 16:58 ` [PATCH bpf-next v3 4/7] bpf: Introduce support for bpf_local_irq_{save,restore} Kumar Kartikeya Dwivedi
@ 2024-11-27 16:58 ` Kumar Kartikeya Dwivedi
  2024-11-28  4:34   ` Eduard Zingerman
  2024-11-27 16:58 ` [PATCH bpf-next v3 6/7] selftests/bpf: Expand coverage of preempt tests to sleepable kfunc Kumar Kartikeya Dwivedi
  2024-11-27 16:58 ` [PATCH bpf-next v3 7/7] selftests/bpf: Add IRQ save/restore tests Kumar Kartikeya Dwivedi
  6 siblings, 1 reply; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-27 16:58 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Eduard Zingerman, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, kernel-team

The verifier log when leaking resources on BPF_EXIT may be a bit
confusing, as it's a problem only when finally existing from the main
prog, not from any of the subprogs. Hence, update the verifier error
string and the corresponding selftests matching on it.

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c                              |  2 +-
 .../testing/selftests/bpf/progs/exceptions_fail.c  |  4 ++--
 tools/testing/selftests/bpf/progs/preempt_lock.c   | 14 +++++++-------
 .../selftests/bpf/progs/verifier_spin_lock.c       |  2 +-
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c6b40da49835..b9fdb7e362ca 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19088,7 +19088,7 @@ static int do_check(struct bpf_verifier_env *env)
 				 * match caller reference state when it exits.
 				 */
 				err = check_resource_leak(env, exception_exit, !env->cur_state->curframe,
-							  "BPF_EXIT instruction");
+							  "BPF_EXIT instruction in main prog");
 				if (err)
 					return err;
 
diff --git a/tools/testing/selftests/bpf/progs/exceptions_fail.c b/tools/testing/selftests/bpf/progs/exceptions_fail.c
index fe0f3fa5aab6..8a0fdff89927 100644
--- a/tools/testing/selftests/bpf/progs/exceptions_fail.c
+++ b/tools/testing/selftests/bpf/progs/exceptions_fail.c
@@ -131,7 +131,7 @@ int reject_subprog_with_lock(void *ctx)
 }
 
 SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_rcu_read_lock-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_rcu_read_lock-ed region")
 int reject_with_rcu_read_lock(void *ctx)
 {
 	bpf_rcu_read_lock();
@@ -147,7 +147,7 @@ __noinline static int throwing_subprog(struct __sk_buff *ctx)
 }
 
 SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_rcu_read_lock-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_rcu_read_lock-ed region")
 int reject_subprog_with_rcu_read_lock(void *ctx)
 {
 	bpf_rcu_read_lock();
diff --git a/tools/testing/selftests/bpf/progs/preempt_lock.c b/tools/testing/selftests/bpf/progs/preempt_lock.c
index 885377e83607..5269571cf7b5 100644
--- a/tools/testing/selftests/bpf/progs/preempt_lock.c
+++ b/tools/testing/selftests/bpf/progs/preempt_lock.c
@@ -6,7 +6,7 @@
 #include "bpf_experimental.h"
 
 SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
 int preempt_lock_missing_1(struct __sk_buff *ctx)
 {
 	bpf_preempt_disable();
@@ -14,7 +14,7 @@ int preempt_lock_missing_1(struct __sk_buff *ctx)
 }
 
 SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
 int preempt_lock_missing_2(struct __sk_buff *ctx)
 {
 	bpf_preempt_disable();
@@ -23,7 +23,7 @@ int preempt_lock_missing_2(struct __sk_buff *ctx)
 }
 
 SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
 int preempt_lock_missing_3(struct __sk_buff *ctx)
 {
 	bpf_preempt_disable();
@@ -33,7 +33,7 @@ int preempt_lock_missing_3(struct __sk_buff *ctx)
 }
 
 SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
 int preempt_lock_missing_3_minus_2(struct __sk_buff *ctx)
 {
 	bpf_preempt_disable();
@@ -55,7 +55,7 @@ static __noinline void preempt_enable(void)
 }
 
 SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
 int preempt_lock_missing_1_subprog(struct __sk_buff *ctx)
 {
 	preempt_disable();
@@ -63,7 +63,7 @@ int preempt_lock_missing_1_subprog(struct __sk_buff *ctx)
 }
 
 SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
 int preempt_lock_missing_2_subprog(struct __sk_buff *ctx)
 {
 	preempt_disable();
@@ -72,7 +72,7 @@ int preempt_lock_missing_2_subprog(struct __sk_buff *ctx)
 }
 
 SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
 int preempt_lock_missing_2_minus_1_subprog(struct __sk_buff *ctx)
 {
 	preempt_disable();
diff --git a/tools/testing/selftests/bpf/progs/verifier_spin_lock.c b/tools/testing/selftests/bpf/progs/verifier_spin_lock.c
index 3f679de73229..25599eac9a70 100644
--- a/tools/testing/selftests/bpf/progs/verifier_spin_lock.c
+++ b/tools/testing/selftests/bpf/progs/verifier_spin_lock.c
@@ -187,7 +187,7 @@ l0_%=:	r6 = r0;					\
 
 SEC("cgroup/skb")
 __description("spin_lock: test6 missing unlock")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_spin_lock-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_spin_lock-ed region")
 __failure_unpriv __msg_unpriv("")
 __naked void spin_lock_test6_missing_unlock(void)
 {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH bpf-next v3 6/7] selftests/bpf: Expand coverage of preempt tests to sleepable kfunc
  2024-11-27 16:58 [PATCH bpf-next v3 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
                   ` (4 preceding siblings ...)
  2024-11-27 16:58 ` [PATCH bpf-next v3 5/7] bpf: Improve verifier log for resource leak on exit Kumar Kartikeya Dwivedi
@ 2024-11-27 16:58 ` Kumar Kartikeya Dwivedi
  2024-11-27 16:58 ` [PATCH bpf-next v3 7/7] selftests/bpf: Add IRQ save/restore tests Kumar Kartikeya Dwivedi
  6 siblings, 0 replies; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-27 16:58 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Eduard Zingerman, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, kernel-team

For preemption-related kfuncs, we don't test their interaction with
sleepable kfuncs (we do test helpers) even though the verifier has
code to protect against such a pattern. Expand coverage of the selftest
to include this case.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/testing/selftests/bpf/progs/preempt_lock.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/preempt_lock.c b/tools/testing/selftests/bpf/progs/preempt_lock.c
index 5269571cf7b5..788cf155d641 100644
--- a/tools/testing/selftests/bpf/progs/preempt_lock.c
+++ b/tools/testing/selftests/bpf/progs/preempt_lock.c
@@ -113,6 +113,18 @@ int preempt_sleepable_helper(void *ctx)
 	return 0;
 }
 
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("kernel func bpf_copy_from_user_str is sleepable within non-preemptible region")
+int preempt_sleepable_kfunc(void *ctx)
+{
+	u32 data;
+
+	bpf_preempt_disable();
+	bpf_copy_from_user_str(&data, sizeof(data), NULL, 0);
+	bpf_preempt_enable();
+	return 0;
+}
+
 int __noinline preempt_global_subprog(void)
 {
 	preempt_balance_subprog();
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH bpf-next v3 7/7] selftests/bpf: Add IRQ save/restore tests
  2024-11-27 16:58 [PATCH bpf-next v3 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
                   ` (5 preceding siblings ...)
  2024-11-27 16:58 ` [PATCH bpf-next v3 6/7] selftests/bpf: Expand coverage of preempt tests to sleepable kfunc Kumar Kartikeya Dwivedi
@ 2024-11-27 16:58 ` Kumar Kartikeya Dwivedi
  6 siblings, 0 replies; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-27 16:58 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Eduard Zingerman, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, kernel-team

Include tests that check for rejection in erroneous cases, like
unbalanced IRQ-disabled counts, within and across subprogs, invalid IRQ
flag state or input to kfuncs, behavior upon overwriting IRQ saved state
on stack, interaction with sleepable kfuncs/helpers, global functions,
and out of order restore. Include some success scenarios as well to
demonstrate usage.

#128/1   irq/irq_save_bad_arg:OK
#128/2   irq/irq_restore_bad_arg:OK
#128/3   irq/irq_restore_missing_2:OK
#128/4   irq/irq_restore_missing_3:OK
#128/5   irq/irq_restore_missing_3_minus_2:OK
#128/6   irq/irq_restore_missing_1_subprog:OK
#128/7   irq/irq_restore_missing_2_subprog:OK
#128/8   irq/irq_restore_missing_3_subprog:OK
#128/9   irq/irq_restore_missing_3_minus_2_subprog:OK
#128/10  irq/irq_balance:OK
#128/11  irq/irq_balance_n:OK
#128/12  irq/irq_balance_subprog:OK
#128/13  irq/irq_global_subprog:OK
#128/14  irq/irq_restore_ooo:OK
#128/15  irq/irq_restore_ooo_3:OK
#128/16  irq/irq_restore_3_subprog:OK
#128/17  irq/irq_restore_4_subprog:OK
#128/18  irq/irq_restore_ooo_3_subprog:OK
#128/19  irq/irq_restore_invalid:OK
#128/20  irq/irq_save_invalid:OK
#128/21  irq/irq_restore_iter:OK
#128/22  irq/irq_save_iter:OK
#128/23  irq/irq_flag_overwrite:OK
#128/24  irq/irq_flag_overwrite_partial:OK
#128/25  irq/irq_sleepable_helper:OK
#128/26  irq/irq_sleepable_kfunc:OK
#128     irq:OK
Summary: 1/26 PASSED, 0 SKIPPED, 0 FAILED

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/verifier.c       |   2 +
 tools/testing/selftests/bpf/progs/irq.c       | 397 ++++++++++++++++++
 2 files changed, 399 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/irq.c

diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c
index d9f65adb456b..b1b4d69c407a 100644
--- a/tools/testing/selftests/bpf/prog_tests/verifier.c
+++ b/tools/testing/selftests/bpf/prog_tests/verifier.c
@@ -98,6 +98,7 @@
 #include "verifier_xdp_direct_packet_access.skel.h"
 #include "verifier_bits_iter.skel.h"
 #include "verifier_lsm.skel.h"
+#include "irq.skel.h"
 
 #define MAX_ENTRIES 11
 
@@ -225,6 +226,7 @@ void test_verifier_xdp(void)                  { RUN(verifier_xdp); }
 void test_verifier_xdp_direct_packet_access(void) { RUN(verifier_xdp_direct_packet_access); }
 void test_verifier_bits_iter(void) { RUN(verifier_bits_iter); }
 void test_verifier_lsm(void)                  { RUN(verifier_lsm); }
+void test_irq(void)			      { RUN(irq); }
 
 void test_verifier_mtu(void)
 {
diff --git a/tools/testing/selftests/bpf/progs/irq.c b/tools/testing/selftests/bpf/progs/irq.c
new file mode 100644
index 000000000000..b5056ac17384
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/irq.c
@@ -0,0 +1,397 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+unsigned long global_flags;
+
+extern void bpf_local_irq_save(unsigned long *) __weak __ksym;
+extern void bpf_local_irq_restore(unsigned long *) __weak __ksym;
+extern int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void *unsafe_ptr__ign, u64 flags) __weak __ksym;
+
+SEC("?tc")
+__failure __msg("arg#0 doesn't point to an irq flag on stack")
+int irq_save_bad_arg(struct __sk_buff *ctx)
+{
+	bpf_local_irq_save(&global_flags);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("arg#0 doesn't point to an irq flag on stack")
+int irq_restore_bad_arg(struct __sk_buff *ctx)
+{
+	bpf_local_irq_restore(&global_flags);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_2(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+
+	bpf_local_irq_save(&flags1);
+	bpf_local_irq_save(&flags2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_3(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	bpf_local_irq_save(&flags1);
+	bpf_local_irq_save(&flags2);
+	bpf_local_irq_save(&flags3);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_3_minus_2(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	bpf_local_irq_save(&flags1);
+	bpf_local_irq_save(&flags2);
+	bpf_local_irq_save(&flags3);
+	bpf_local_irq_restore(&flags3);
+	bpf_local_irq_restore(&flags2);
+	return 0;
+}
+
+static __noinline void local_irq_save(unsigned long *flags)
+{
+	bpf_local_irq_save(flags);
+}
+
+static __noinline void local_irq_restore(unsigned long *flags)
+{
+	bpf_local_irq_restore(flags);
+}
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_1_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags;
+
+	local_irq_save(&flags);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_2_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+
+	local_irq_save(&flags1);
+	local_irq_save(&flags2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_3_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	local_irq_save(&flags1);
+	local_irq_save(&flags2);
+	local_irq_save(&flags3);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_3_minus_2_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	local_irq_save(&flags1);
+	local_irq_save(&flags2);
+	local_irq_save(&flags3);
+	local_irq_restore(&flags3);
+	local_irq_restore(&flags2);
+	return 0;
+}
+
+SEC("?tc")
+__success
+int irq_balance(struct __sk_buff *ctx)
+{
+	unsigned long flags;
+
+	local_irq_save(&flags);
+	local_irq_restore(&flags);
+	return 0;
+}
+
+SEC("?tc")
+__success
+int irq_balance_n(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	local_irq_save(&flags1);
+	local_irq_save(&flags2);
+	local_irq_save(&flags3);
+	local_irq_restore(&flags3);
+	local_irq_restore(&flags2);
+	local_irq_restore(&flags1);
+	return 0;
+}
+
+static __noinline void local_irq_balance(void)
+{
+	unsigned long flags;
+
+	local_irq_save(&flags);
+	local_irq_restore(&flags);
+}
+
+static __noinline void local_irq_balance_n(void)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	local_irq_save(&flags1);
+	local_irq_save(&flags2);
+	local_irq_save(&flags3);
+	local_irq_restore(&flags3);
+	local_irq_restore(&flags2);
+	local_irq_restore(&flags1);
+}
+
+SEC("?tc")
+__success
+int irq_balance_subprog(struct __sk_buff *ctx)
+{
+	local_irq_balance();
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("sleepable helper bpf_copy_from_user#")
+int irq_sleepable_helper(void *ctx)
+{
+	unsigned long flags;
+	u32 data;
+
+	local_irq_save(&flags);
+	bpf_copy_from_user(&data, sizeof(data), NULL);
+	local_irq_restore(&flags);
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("kernel func bpf_copy_from_user_str is sleepable within IRQ-disabled region")
+int irq_sleepable_kfunc(void *ctx)
+{
+	unsigned long flags;
+	u32 data;
+
+	local_irq_save(&flags);
+	bpf_copy_from_user_str(&data, sizeof(data), NULL, 0);
+	local_irq_restore(&flags);
+	return 0;
+}
+
+int __noinline global_local_irq_balance(void)
+{
+	local_irq_balance_n();
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("global function calls are not allowed with IRQs disabled")
+int irq_global_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags;
+
+	bpf_local_irq_save(&flags);
+	global_local_irq_balance();
+	bpf_local_irq_restore(&flags);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("cannot restore irq state out of order")
+int irq_restore_ooo(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+
+	bpf_local_irq_save(&flags1);
+	bpf_local_irq_save(&flags2);
+	bpf_local_irq_restore(&flags1);
+	bpf_local_irq_restore(&flags2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("cannot restore irq state out of order")
+int irq_restore_ooo_3(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	bpf_local_irq_save(&flags1);
+	bpf_local_irq_save(&flags2);
+	bpf_local_irq_restore(&flags2);
+	bpf_local_irq_save(&flags3);
+	bpf_local_irq_restore(&flags1);
+	bpf_local_irq_restore(&flags3);
+	return 0;
+}
+
+static __noinline void local_irq_save_3(unsigned long *flags1, unsigned long *flags2,
+					unsigned long *flags3)
+{
+	local_irq_save(flags1);
+	local_irq_save(flags2);
+	local_irq_save(flags3);
+}
+
+SEC("?tc")
+__success
+int irq_restore_3_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	local_irq_save_3(&flags1, &flags2, &flags3);
+	bpf_local_irq_restore(&flags3);
+	bpf_local_irq_restore(&flags2);
+	bpf_local_irq_restore(&flags1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("cannot restore irq state out of order")
+int irq_restore_4_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+	unsigned long flags4;
+
+	local_irq_save_3(&flags1, &flags2, &flags3);
+	bpf_local_irq_restore(&flags3);
+	bpf_local_irq_save(&flags4);
+	bpf_local_irq_restore(&flags4);
+	bpf_local_irq_restore(&flags1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("cannot restore irq state out of order")
+int irq_restore_ooo_3_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	local_irq_save_3(&flags1, &flags2, &flags3);
+	bpf_local_irq_restore(&flags3);
+	bpf_local_irq_restore(&flags2);
+	bpf_local_irq_save(&flags3);
+	bpf_local_irq_restore(&flags1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("expected an initialized")
+int irq_restore_invalid(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags = 0xfaceb00c;
+
+	bpf_local_irq_save(&flags1);
+	bpf_local_irq_restore(&flags);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("expected uninitialized")
+int irq_save_invalid(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+
+	bpf_local_irq_save(&flags1);
+	bpf_local_irq_save(&flags1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("expected an initialized")
+int irq_restore_iter(struct __sk_buff *ctx)
+{
+	struct bpf_iter_num it;
+
+	bpf_iter_num_new(&it, 0, 42);
+	bpf_local_irq_restore((unsigned long *)&it);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("Unreleased reference id=1")
+int irq_save_iter(struct __sk_buff *ctx)
+{
+	struct bpf_iter_num it;
+
+	/* Ensure same sized slot has st->ref_obj_id set, so we reject based on
+	 * slot_type != STACK_IRQ_FLAG...
+	 */
+	_Static_assert(sizeof(it) == sizeof(unsigned long), "broken iterator size");
+
+	bpf_iter_num_new(&it, 0, 42);
+	bpf_local_irq_save((unsigned long *)&it);
+	bpf_local_irq_restore((unsigned long *)&it);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("expected an initialized")
+int irq_flag_overwrite(struct __sk_buff *ctx)
+{
+	unsigned long flags;
+
+	bpf_local_irq_save(&flags);
+	flags = 0xdeadbeef;
+	bpf_local_irq_restore(&flags);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("expected an initialized")
+int irq_flag_overwrite_partial(struct __sk_buff *ctx)
+{
+	unsigned long flags;
+
+	bpf_local_irq_save(&flags);
+	*(((char *)&flags) + 1) = 0xff;
+	bpf_local_irq_restore(&flags);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf-next v3 1/7] bpf: Consolidate locks and reference state in verifier state
  2024-11-27 16:58 ` [PATCH bpf-next v3 1/7] bpf: Consolidate locks and reference state in verifier state Kumar Kartikeya Dwivedi
@ 2024-11-28  2:39   ` Eduard Zingerman
  2024-11-28  2:54     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 21+ messages in thread
From: Eduard Zingerman @ 2024-11-28  2:39 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Wed, 2024-11-27 at 08:58 -0800, Kumar Kartikeya Dwivedi wrote:
> Currently, state for RCU read locks and preemption is in
> bpf_verifier_state, while locks and pointer reference state remains in
> bpf_func_state. There is no particular reason to keep the latter in
> bpf_func_state. Additionally, it is copied into a new frame's state and
> copied back to the caller frame's state everytime the verifier processes
> a pseudo call instruction. This is a bit wasteful, given this state is
> global for a given verification state / path.
> 
> Move all resource and reference related state in bpf_verifier_state
> structure in this patch, in preparation for introducing new reference
> state types in the future.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

lgtm, but please fix the 'print_verifier_state' note below.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

> ---
>  include/linux/bpf_verifier.h |  11 ++--
>  kernel/bpf/log.c             |  11 ++--
>  kernel/bpf/verifier.c        | 112 ++++++++++++++++-------------------
>  3 files changed, 64 insertions(+), 70 deletions(-)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index f4290c179bee..af64b5415df8 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -315,9 +315,6 @@ struct bpf_func_state {
>  	u32 callback_depth;
>  
>  	/* The following fields should be last. See copy_func_state() */
> -	int acquired_refs;
> -	int active_locks;
> -	struct bpf_reference_state *refs;
>  	/* The state of the stack. Each element of the array describes BPF_REG_SIZE
>  	 * (i.e. 8) bytes worth of stack memory.
>  	 * stack[0] represents bytes [*(r10-8)..*(r10-1)]
> @@ -419,9 +416,13 @@ struct bpf_verifier_state {
>  	u32 insn_idx;
>  	u32 curframe;
>  
> -	bool speculative;
> +	struct bpf_reference_state *refs;
> +	u32 acquired_refs;
> +	u32 active_locks;
> +	u32 active_preempt_locks;
>  	bool active_rcu_lock;
> -	u32 active_preempt_lock;
> +
> +	bool speculative;

Nit: pahole says there are two holes here:

     $ pahole kernel/bpf/verifier.o
     ...
     struct bpf_verifier_state {
        struct bpf_func_state *    frame[8];             /*     0    64 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct bpf_verifier_state * parent;              /*    64     8 */
        u32                        branches;             /*    72     4 */
        u32                        insn_idx;             /*    76     4 */
        u32                        curframe;             /*    80     4 */

        /* XXX 4 bytes hole, try to pack */

        struct bpf_reference_state * refs;               /*    88     8 */
        u32                        acquired_refs;        /*    96     4 */
        u32                        active_locks;         /*   100     4 */
        u32                        active_preempt_locks; /*   104     4 */
        u32                        active_irq_id;        /*   108     4 */
        bool                       active_rcu_lock;      /*   112     1 */
        bool                       speculative;          /*   113     1 */
        bool                       used_as_loop_entry;   /*   114     1 */
        bool                       in_sleepable;         /*   115     1 */
        u32                        first_insn_idx;       /*   116     4 */
        u32                        last_insn_idx;        /*   120     4 */

        /* XXX 4 bytes hole, try to pack */

        /* --- cacheline 2 boundary (128 bytes) --- */
        struct bpf_verifier_state * loop_entry;          /*   128     8 */
        u32                        insn_hist_start;      /*   136     4 */
        u32                        insn_hist_end;        /*   140     4 */
        u32                        dfs_depth;            /*   144     4 */
        u32                        callback_unroll_depth; /*   148     4 */
        u32                        may_goto_depth;       /*   152     4 */

        /* size: 160, cachelines: 3, members: 22 */
        /* sum members: 148, holes: 2, sum holes: 8 */

    maybe move the 'refs' pointer?
    e.g. moving it after 'parent' makes both holes disappear.

>  	/* If this state was ever pointed-to by other state's loop_entry field
>  	 * this flag would be set to true. Used to avoid freeing such states
>  	 * while they are still in use.
> diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
> index 4a858fdb6476..8b52e5b7504c 100644
> --- a/kernel/bpf/log.c
> +++ b/kernel/bpf/log.c
> @@ -756,6 +756,7 @@ static void print_reg_state(struct bpf_verifier_env *env,
>  void print_verifier_state(struct bpf_verifier_env *env, const struct bpf_func_state *state,
>  			  bool print_all)
>  {
> +	struct bpf_verifier_state *vstate = env->cur_state;

This is not always true.
For example, __mark_chain_precision does 'print_verifier_state(env, func, true)'
for func obtained as 'func = st->frame[fr];' where 'st' iterates over parents
of env->cur_state.

>  	const struct bpf_reg_state *reg;
>  	int i;
>  

[...]


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf-next v3 1/7] bpf: Consolidate locks and reference state in verifier state
  2024-11-28  2:39   ` Eduard Zingerman
@ 2024-11-28  2:54     ` Kumar Kartikeya Dwivedi
  2024-11-28  3:03       ` Eduard Zingerman
  0 siblings, 1 reply; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-28  2:54 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 28 Nov 2024 at 03:39, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2024-11-27 at 08:58 -0800, Kumar Kartikeya Dwivedi wrote:
> > Currently, state for RCU read locks and preemption is in
> > bpf_verifier_state, while locks and pointer reference state remains in
> > bpf_func_state. There is no particular reason to keep the latter in
> > bpf_func_state. Additionally, it is copied into a new frame's state and
> > copied back to the caller frame's state everytime the verifier processes
> > a pseudo call instruction. This is a bit wasteful, given this state is
> > global for a given verification state / path.
> >
> > Move all resource and reference related state in bpf_verifier_state
> > structure in this patch, in preparation for introducing new reference
> > state types in the future.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>
> lgtm, but please fix the 'print_verifier_state' note below.
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> > ---
> >  include/linux/bpf_verifier.h |  11 ++--
> >  kernel/bpf/log.c             |  11 ++--
> >  kernel/bpf/verifier.c        | 112 ++++++++++++++++-------------------
> >  3 files changed, 64 insertions(+), 70 deletions(-)
> >
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index f4290c179bee..af64b5415df8 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -315,9 +315,6 @@ struct bpf_func_state {
> >       u32 callback_depth;
> >
> >       /* The following fields should be last. See copy_func_state() */
> > -     int acquired_refs;
> > -     int active_locks;
> > -     struct bpf_reference_state *refs;
> >       /* The state of the stack. Each element of the array describes BPF_REG_SIZE
> >        * (i.e. 8) bytes worth of stack memory.
> >        * stack[0] represents bytes [*(r10-8)..*(r10-1)]
> > @@ -419,9 +416,13 @@ struct bpf_verifier_state {
> >       u32 insn_idx;
> >       u32 curframe;
> >
> > -     bool speculative;
> > +     struct bpf_reference_state *refs;
> > +     u32 acquired_refs;
> > +     u32 active_locks;
> > +     u32 active_preempt_locks;
> >       bool active_rcu_lock;
> > -     u32 active_preempt_lock;
> > +
> > +     bool speculative;
>
> Nit: pahole says there are two holes here:
>
>      $ pahole kernel/bpf/verifier.o
>      ...
>      struct bpf_verifier_state {
>         struct bpf_func_state *    frame[8];             /*     0    64 */
>         /* --- cacheline 1 boundary (64 bytes) --- */
>         struct bpf_verifier_state * parent;              /*    64     8 */
>         u32                        branches;             /*    72     4 */
>         u32                        insn_idx;             /*    76     4 */
>         u32                        curframe;             /*    80     4 */
>
>         /* XXX 4 bytes hole, try to pack */
>
>         struct bpf_reference_state * refs;               /*    88     8 */
>         u32                        acquired_refs;        /*    96     4 */
>         u32                        active_locks;         /*   100     4 */
>         u32                        active_preempt_locks; /*   104     4 */
>         u32                        active_irq_id;        /*   108     4 */
>         bool                       active_rcu_lock;      /*   112     1 */
>         bool                       speculative;          /*   113     1 */
>         bool                       used_as_loop_entry;   /*   114     1 */
>         bool                       in_sleepable;         /*   115     1 */
>         u32                        first_insn_idx;       /*   116     4 */
>         u32                        last_insn_idx;        /*   120     4 */
>
>         /* XXX 4 bytes hole, try to pack */
>
>         /* --- cacheline 2 boundary (128 bytes) --- */
>         struct bpf_verifier_state * loop_entry;          /*   128     8 */
>         u32                        insn_hist_start;      /*   136     4 */
>         u32                        insn_hist_end;        /*   140     4 */
>         u32                        dfs_depth;            /*   144     4 */
>         u32                        callback_unroll_depth; /*   148     4 */
>         u32                        may_goto_depth;       /*   152     4 */
>
>         /* size: 160, cachelines: 3, members: 22 */
>         /* sum members: 148, holes: 2, sum holes: 8 */
>
>     maybe move the 'refs' pointer?
>     e.g. moving it after 'parent' makes both holes disappear.

Ack, will fix.

>
> >       /* If this state was ever pointed-to by other state's loop_entry field
> >        * this flag would be set to true. Used to avoid freeing such states
> >        * while they are still in use.
> > diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
> > index 4a858fdb6476..8b52e5b7504c 100644
> > --- a/kernel/bpf/log.c
> > +++ b/kernel/bpf/log.c
> > @@ -756,6 +756,7 @@ static void print_reg_state(struct bpf_verifier_env *env,
> >  void print_verifier_state(struct bpf_verifier_env *env, const struct bpf_func_state *state,
> >                         bool print_all)
> >  {
> > +     struct bpf_verifier_state *vstate = env->cur_state;
>
> This is not always true.
> For example, __mark_chain_precision does 'print_verifier_state(env, func, true)'
> for func obtained as 'func = st->frame[fr];' where 'st' iterates over parents
> of env->cur_state.

Looking through the code, I'm thinking the only proper fix is
explicitly passing in the verifier state, I was hoping there would be
a link from func_state -> verifier_state but it is not the case.
Regardless, explicitly passing in the verifier state is probably cleaner. WDYT?

>
> >       const struct bpf_reg_state *reg;
> >       int i;
> >
>
> [...]
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf-next v3 1/7] bpf: Consolidate locks and reference state in verifier state
  2024-11-28  2:54     ` Kumar Kartikeya Dwivedi
@ 2024-11-28  3:03       ` Eduard Zingerman
  2024-11-28  3:18         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 21+ messages in thread
From: Eduard Zingerman @ 2024-11-28  3:03 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 2024-11-28 at 03:54 +0100, Kumar Kartikeya Dwivedi wrote:

[...]

> > > --- a/kernel/bpf/log.c
> > > +++ b/kernel/bpf/log.c
> > > @@ -756,6 +756,7 @@ static void print_reg_state(struct bpf_verifier_env *env,
> > >  void print_verifier_state(struct bpf_verifier_env *env, const struct bpf_func_state *state,
> > >                         bool print_all)
> > >  {
> > > +     struct bpf_verifier_state *vstate = env->cur_state;
> > 
> > This is not always true.
> > For example, __mark_chain_precision does 'print_verifier_state(env, func, true)'
> > for func obtained as 'func = st->frame[fr];' where 'st' iterates over parents
> > of env->cur_state.
> 
> Looking through the code, I'm thinking the only proper fix is
> explicitly passing in the verifier state, I was hoping there would be
> a link from func_state -> verifier_state but it is not the case.
> Regardless, explicitly passing in the verifier state is probably cleaner. WDYT?

Seems like it is (I'd also pass the frame number, instead of function
state pointer, just to make it clear where the function state comes from,
but feel free to ignore this suggestion).

[...]


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf-next v3 1/7] bpf: Consolidate locks and reference state in verifier state
  2024-11-28  3:03       ` Eduard Zingerman
@ 2024-11-28  3:18         ` Kumar Kartikeya Dwivedi
  2024-11-28  3:22           ` Eduard Zingerman
  0 siblings, 1 reply; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-28  3:18 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 28 Nov 2024 at 04:03, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-11-28 at 03:54 +0100, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > > > --- a/kernel/bpf/log.c
> > > > +++ b/kernel/bpf/log.c
> > > > @@ -756,6 +756,7 @@ static void print_reg_state(struct bpf_verifier_env *env,
> > > >  void print_verifier_state(struct bpf_verifier_env *env, const struct bpf_func_state *state,
> > > >                         bool print_all)
> > > >  {
> > > > +     struct bpf_verifier_state *vstate = env->cur_state;
> > >
> > > This is not always true.
> > > For example, __mark_chain_precision does 'print_verifier_state(env, func, true)'
> > > for func obtained as 'func = st->frame[fr];' where 'st' iterates over parents
> > > of env->cur_state.
> >
> > Looking through the code, I'm thinking the only proper fix is
> > explicitly passing in the verifier state, I was hoping there would be
> > a link from func_state -> verifier_state but it is not the case.
> > Regardless, explicitly passing in the verifier state is probably cleaner. WDYT?
>
> Seems like it is (I'd also pass the frame number, instead of function
> state pointer, just to make it clear where the function state comes from,
> but feel free to ignore this suggestion).

I made this change, but not passing the frame number: while most call
sites have the frame number (or pass curframe), it needs to be
obtained explicitly for some, so I think it won't be worth it.

>
> [...]
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf-next v3 1/7] bpf: Consolidate locks and reference state in verifier state
  2024-11-28  3:18         ` Kumar Kartikeya Dwivedi
@ 2024-11-28  3:22           ` Eduard Zingerman
  2024-11-28  3:32             ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 21+ messages in thread
From: Eduard Zingerman @ 2024-11-28  3:22 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 2024-11-28 at 04:18 +0100, Kumar Kartikeya Dwivedi wrote:
> On Thu, 28 Nov 2024 at 04:03, Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Thu, 2024-11-28 at 03:54 +0100, Kumar Kartikeya Dwivedi wrote:
> > 
> > [...]
> > 
> > > > > --- a/kernel/bpf/log.c
> > > > > +++ b/kernel/bpf/log.c
> > > > > @@ -756,6 +756,7 @@ static void print_reg_state(struct bpf_verifier_env *env,
> > > > >  void print_verifier_state(struct bpf_verifier_env *env, const struct bpf_func_state *state,
> > > > >                         bool print_all)
> > > > >  {
> > > > > +     struct bpf_verifier_state *vstate = env->cur_state;
> > > > 
> > > > This is not always true.
> > > > For example, __mark_chain_precision does 'print_verifier_state(env, func, true)'
> > > > for func obtained as 'func = st->frame[fr];' where 'st' iterates over parents
> > > > of env->cur_state.
> > > 
> > > Looking through the code, I'm thinking the only proper fix is
> > > explicitly passing in the verifier state, I was hoping there would be
> > > a link from func_state -> verifier_state but it is not the case.
> > > Regardless, explicitly passing in the verifier state is probably cleaner. WDYT?
> > 
> > Seems like it is (I'd also pass the frame number, instead of function
> > state pointer, just to make it clear where the function state comes from,
> > but feel free to ignore this suggestion).
> 
> I made this change, but not passing the frame number: while most call
> sites have the frame number (or pass curframe), it needs to be
> obtained explicitly for some, so I think it won't be worth it.

Understood, thank you.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf-next v3 1/7] bpf: Consolidate locks and reference state in verifier state
  2024-11-28  3:22           ` Eduard Zingerman
@ 2024-11-28  3:32             ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-28  3:32 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 28 Nov 2024 at 04:22, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-11-28 at 04:18 +0100, Kumar Kartikeya Dwivedi wrote:
> > On Thu, 28 Nov 2024 at 04:03, Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Thu, 2024-11-28 at 03:54 +0100, Kumar Kartikeya Dwivedi wrote:
> > >
> > > [...]
> > >
> > > > > > --- a/kernel/bpf/log.c
> > > > > > +++ b/kernel/bpf/log.c
> > > > > > @@ -756,6 +756,7 @@ static void print_reg_state(struct bpf_verifier_env *env,
> > > > > >  void print_verifier_state(struct bpf_verifier_env *env, const struct bpf_func_state *state,
> > > > > >                         bool print_all)
> > > > > >  {
> > > > > > +     struct bpf_verifier_state *vstate = env->cur_state;
> > > > >
> > > > > This is not always true.
> > > > > For example, __mark_chain_precision does 'print_verifier_state(env, func, true)'
> > > > > for func obtained as 'func = st->frame[fr];' where 'st' iterates over parents
> > > > > of env->cur_state.
> > > >
> > > > Looking through the code, I'm thinking the only proper fix is
> > > > explicitly passing in the verifier state, I was hoping there would be
> > > > a link from func_state -> verifier_state but it is not the case.
> > > > Regardless, explicitly passing in the verifier state is probably cleaner. WDYT?
> > >
> > > Seems like it is (I'd also pass the frame number, instead of function
> > > state pointer, just to make it clear where the function state comes from,
> > > but feel free to ignore this suggestion).
> >
> > I made this change, but not passing the frame number: while most call
> > sites have the frame number (or pass curframe), it needs to be
> > obtained explicitly for some, so I think it won't be worth it.
>
> Understood, thank you.
>

Ok, scratch the previous reply, I forgot you can actually do
func->frameno to get it, I was trying dumb things (like func -
st->frame).
I do agree it's better to pass the frameno, just for the off chance
that you end up passing vstate and funcs that mismatch.
So I ended up making the change in the end. Sorry for the confusion.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: Refactor {acquire,release}_reference_state
  2024-11-27 16:58 ` [PATCH bpf-next v3 2/7] bpf: Refactor {acquire,release}_reference_state Kumar Kartikeya Dwivedi
@ 2024-11-28  4:13   ` Eduard Zingerman
  2024-11-28  4:30     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 21+ messages in thread
From: Eduard Zingerman @ 2024-11-28  4:13 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Wed, 2024-11-27 at 08:58 -0800, Kumar Kartikeya Dwivedi wrote:

Overall looks good, but please take a look at a few notes below.

[...]

> @@ -1349,77 +1350,69 @@ static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state
>   * On success, returns a valid pointer id to associate with the register
>   * On failure, returns a negative errno.
>   */
> -static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx)
> +static struct bpf_reference_state *acquire_reference_state(struct bpf_verifier_env *env, int insn_idx, bool gen_id)
>  {
>  	struct bpf_verifier_state *state = env->cur_state;
>  	int new_ofs = state->acquired_refs;
> -	int id, err;
> +	int err;
>  
>  	err = resize_reference_state(state, state->acquired_refs + 1);
>  	if (err)
> -		return err;
> -	id = ++env->id_gen;
> -	state->refs[new_ofs].type = REF_TYPE_PTR;
> -	state->refs[new_ofs].id = id;
> +		return NULL;
> +	if (gen_id)
> +		state->refs[new_ofs].id = ++env->id_gen;

Nit: state->refs[new_ods].id might end up with garbage value if 'gen_id' is false.
     The resize_reference_state() uses realloc_array(),
     which allocates memory with GFP_KERNEL, but without __GFP_ZERO flag.
     This is not a problem with current patch, as you always check
     reference type before checking id, but most of the data strucures
     in verifier are zero initialized just in case.

>  	state->refs[new_ofs].insn_idx = insn_idx;
>  
> -	return id;
> +	return &state->refs[new_ofs];
> +}

[...]

> -/* release function corresponding to acquire_reference_state(). Idempotent. */
> -static int release_reference_state(struct bpf_verifier_state *state, int ptr_id)
> +static void release_reference_state(struct bpf_verifier_state *state, int idx)
>  {
> -	int i, last_idx;
> +	int last_idx;
>  
>  	last_idx = state->acquired_refs - 1;
> -	for (i = 0; i < state->acquired_refs; i++) {
> -		if (state->refs[i].type != REF_TYPE_PTR)
> -			continue;
> -		if (state->refs[i].id == ptr_id) {
> -			if (last_idx && i != last_idx)
> -				memcpy(&state->refs[i], &state->refs[last_idx],
> -				       sizeof(*state->refs));
> -			memset(&state->refs[last_idx], 0, sizeof(*state->refs));
> -			state->acquired_refs--;
> -			return 0;
> -		}
> -	}
> -	return -EINVAL;
> +	if (last_idx && idx != last_idx)
> +		memcpy(&state->refs[idx], &state->refs[last_idx], sizeof(*state->refs));
> +	memset(&state->refs[last_idx], 0, sizeof(*state->refs));
> +	state->acquired_refs--;
> +	return;
>  }

Such implementation replaces element at 'idx' with element at 'last_idx'.
If the intention is to use 'state->refs' as a stack of acquired irq flags,
the stack property would be broken by this trick.
E.g. consider array [a, b, c, d] where 'idx' points to 'b',
after release_reference_state() the array would become [a, d, c].
You need to do 'memmove' instead.

[...]

> @@ -9666,21 +9659,41 @@ static void mark_pkt_end(struct bpf_verifier_state *vstate, int regn, bool range
>  		reg->range = AT_PKT_END;
>  }
>  
> +static int release_reference_nomark(struct bpf_verifier_state *state, int ref_obj_id)
> +{
> +	int i;
> +
> +	for (i = 0; i < state->acquired_refs; i++) {
> +		if (state->refs[i].type != REF_TYPE_PTR)
> +			continue;
> +		if (state->refs[i].id == ref_obj_id) {
> +			release_reference_state(state, i);
> +			return 0;
> +		}
> +	}
> +	return -EINVAL;
> +}
> +
>  /* The pointer with the specified id has released its reference to kernel
>   * resources. Identify all copies of the same pointer and clear the reference.
> + *
> + * This is the release function corresponding to acquire_reference(). Idempotent.
> + * The 'mark' boolean is used to optionally skip scrubbing registers matching
          ^^^^^^
Nit: this is probably a remnant of some older patch revision,
     function no longer takes 'mark' parameter.

> + * the ref_obj_id, in case they need to be switched to some other type instead
> + * of havoc scalar value.
>   */
> -static int release_reference(struct bpf_verifier_env *env,
> -			     int ref_obj_id)
> +static int release_reference(struct bpf_verifier_env *env, int ref_obj_id)
>  {

[...]


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: Refactor {acquire,release}_reference_state
  2024-11-28  4:13   ` Eduard Zingerman
@ 2024-11-28  4:30     ` Kumar Kartikeya Dwivedi
  2024-11-28  4:36       ` Eduard Zingerman
  0 siblings, 1 reply; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-28  4:30 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 28 Nov 2024 at 05:13, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2024-11-27 at 08:58 -0800, Kumar Kartikeya Dwivedi wrote:
>
> Overall looks good, but please take a look at a few notes below.
>
> [...]
>
> > @@ -1349,77 +1350,69 @@ static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state
> >   * On success, returns a valid pointer id to associate with the register
> >   * On failure, returns a negative errno.
> >   */
> > -static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx)
> > +static struct bpf_reference_state *acquire_reference_state(struct bpf_verifier_env *env, int insn_idx, bool gen_id)
> >  {
> >       struct bpf_verifier_state *state = env->cur_state;
> >       int new_ofs = state->acquired_refs;
> > -     int id, err;
> > +     int err;
> >
> >       err = resize_reference_state(state, state->acquired_refs + 1);
> >       if (err)
> > -             return err;
> > -     id = ++env->id_gen;
> > -     state->refs[new_ofs].type = REF_TYPE_PTR;
> > -     state->refs[new_ofs].id = id;
> > +             return NULL;
> > +     if (gen_id)
> > +             state->refs[new_ofs].id = ++env->id_gen;
>
> Nit: state->refs[new_ods].id might end up with garbage value if 'gen_id' is false.
>      The resize_reference_state() uses realloc_array(),
>      which allocates memory with GFP_KERNEL, but without __GFP_ZERO flag.
>      This is not a problem with current patch, as you always check
>      reference type before checking id, but most of the data strucures
>      in verifier are zero initialized just in case.

We end up assigning to s->id if gen_id is false, e.g.
acquire_lock_state, so I think we'll be fine without __GFP_ZERO.

>
> >       state->refs[new_ofs].insn_idx = insn_idx;
> >
> > -     return id;
> > +     return &state->refs[new_ofs];
> > +}
>
> [...]
>
> > -/* release function corresponding to acquire_reference_state(). Idempotent. */
> > -static int release_reference_state(struct bpf_verifier_state *state, int ptr_id)
> > +static void release_reference_state(struct bpf_verifier_state *state, int idx)
> >  {
> > -     int i, last_idx;
> > +     int last_idx;
> >
> >       last_idx = state->acquired_refs - 1;
> > -     for (i = 0; i < state->acquired_refs; i++) {
> > -             if (state->refs[i].type != REF_TYPE_PTR)
> > -                     continue;
> > -             if (state->refs[i].id == ptr_id) {
> > -                     if (last_idx && i != last_idx)
> > -                             memcpy(&state->refs[i], &state->refs[last_idx],
> > -                                    sizeof(*state->refs));
> > -                     memset(&state->refs[last_idx], 0, sizeof(*state->refs));
> > -                     state->acquired_refs--;
> > -                     return 0;
> > -             }
> > -     }
> > -     return -EINVAL;
> > +     if (last_idx && idx != last_idx)
> > +             memcpy(&state->refs[idx], &state->refs[last_idx], sizeof(*state->refs));
> > +     memset(&state->refs[last_idx], 0, sizeof(*state->refs));
> > +     state->acquired_refs--;
> > +     return;
> >  }
>
> Such implementation replaces element at 'idx' with element at 'last_idx'.
> If the intention is to use 'state->refs' as a stack of acquired irq flags,
> the stack property would be broken by this trick.
> E.g. consider array [a, b, c, d] where 'idx' points to 'b',
> after release_reference_state() the array would become [a, d, c].
> You need to do 'memmove' instead.
>

Wow, great catch. Thanks for spotting this. I'll fix this and let me
see if I can add a selftest that would've triggered this particular
pattern.

> [...]
>
> > @@ -9666,21 +9659,41 @@ static void mark_pkt_end(struct bpf_verifier_state *vstate, int regn, bool range
> >               reg->range = AT_PKT_END;
> >  }
> >
> > +static int release_reference_nomark(struct bpf_verifier_state *state, int ref_obj_id)
> > +{
> > +     int i;
> > +
> > +     for (i = 0; i < state->acquired_refs; i++) {
> > +             if (state->refs[i].type != REF_TYPE_PTR)
> > +                     continue;
> > +             if (state->refs[i].id == ref_obj_id) {
> > +                     release_reference_state(state, i);
> > +                     return 0;
> > +             }
> > +     }
> > +     return -EINVAL;
> > +}
> > +
> >  /* The pointer with the specified id has released its reference to kernel
> >   * resources. Identify all copies of the same pointer and clear the reference.
> > + *
> > + * This is the release function corresponding to acquire_reference(). Idempotent.
> > + * The 'mark' boolean is used to optionally skip scrubbing registers matching
>           ^^^^^^
> Nit: this is probably a remnant of some older patch revision,
>      function no longer takes 'mark' parameter.

Yeah, this is a leftover. Sorry about that. Will fix.

>
> > + * the ref_obj_id, in case they need to be switched to some other type instead
> > + * of havoc scalar value.
> >   */
> > -static int release_reference(struct bpf_verifier_env *env,
> > -                          int ref_obj_id)
> > +static int release_reference(struct bpf_verifier_env *env, int ref_obj_id)
> >  {
>
> [...]
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf: Introduce support for bpf_local_irq_{save,restore}
  2024-11-27 16:58 ` [PATCH bpf-next v3 4/7] bpf: Introduce support for bpf_local_irq_{save,restore} Kumar Kartikeya Dwivedi
@ 2024-11-28  4:31   ` Eduard Zingerman
  2024-11-28  4:39     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 21+ messages in thread
From: Eduard Zingerman @ 2024-11-28  4:31 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Wed, 2024-11-27 at 08:58 -0800, Kumar Kartikeya Dwivedi wrote:
> Teach the verifier about IRQ-disabled sections through the introduction
> of two new kfuncs, bpf_local_irq_save, to save IRQ state and disable
> them, and bpf_local_irq_restore, to restore IRQ state and enable them
> back again.
> 
> For the purposes of tracking the saved IRQ state, the verifier is taught
> about a new special object on the stack of type STACK_IRQ_FLAG. This is
> a 8 byte value which saves the IRQ flags which are to be passed back to
> the IRQ restore kfunc.
> 
> Renumber the enums for REF_TYPE_* to simplify the check in
> find_lock_state, filtering out non-lock types as they grow will become
> cumbersome and is unecessary.
> 
> To track a dynamic number of IRQ-disabled regions and their associated
> saved states, a new resource type RES_TYPE_IRQ is introduced, which its
> state management functions: acquire_irq_state and release_irq_state,
> taking advantage of the refactoring and clean ups made in earlier
> commits.
> 
> One notable requirement of the kernel's IRQ save and restore API is that
> they cannot happen out of order. For this purpose, when releasing reference
> we keep track of the prev_id we saw with REF_TYPE_IRQ. Since reference
> states are inserted in increasing order of the index, this is used to
> remember the ordering of acquisitions of IRQ saved states, so that we
> maintain a logical stack in acquisition order of resource identities,
> and can enforce LIFO ordering when restoring IRQ state. The top of the
> stack is maintained using bpf_verifier_state's active_irq_id.
> 
> The logic to detect initialized and unitialized irq flag slots, marking
> and unmarking is similar to how it's done for iterators. No additional
> checks are needed in refsafe for REF_TYPE_IRQ, apart from the usual
> check_id satisfiability check on the ref[i].id. We have to perform the
> same check_ids check on state->active_irq_id as well.
> 
> The kfuncs themselves are plain wrappers over local_irq_save and
> local_irq_restore macros.
> 
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Sorry, two more nits below.

[...]

> +static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> +{
> +	struct bpf_func_state *state = func(env, reg);
> +	struct bpf_stack_state *slot;
> +	struct bpf_reg_state *st;
> +	int spi, i, err;
> +
> +	spi = irq_flag_get_spi(env, reg);
> +	if (spi < 0)
> +		return spi;
> +
> +	slot = &state->stack[spi];
> +	st = &slot->spilled_ptr;
> +
> +	err = release_irq_state(env->cur_state, st->ref_obj_id);
> +	WARN_ON_ONCE(err && err != -EACCES);
> +	if (err) {
> +		verbose(env, "cannot restore irq state out of order\n");

Nit: maybe also print acquire_irq_id and an instruction where it was acquired?

> +		return err;
> +	}
> +
> +	__mark_reg_not_init(env, st);
> +
> +	/* see unmark_stack_slots_dynptr() for why we need to set REG_LIVE_WRITTEN */
> +	st->live |= REG_LIVE_WRITTEN;
> +
> +	for (i = 0; i < BPF_REG_SIZE; i++)
> +		slot->slot_type[i] = STACK_INVALID;
> +
> +	mark_stack_slot_scratched(env, spi);
> +	return 0;
> +}
> +
> +static bool is_irq_flag_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> +{
> +	struct bpf_func_state *state = func(env, reg);
> +	struct bpf_stack_state *slot;
> +	int spi, i;
> +
> +	/* For -ERANGE (i.e. spi not falling into allocated stack slots), we
> +	 * will do check_mem_access to check and update stack bounds later, so
> +	 * return true for that case.
> +	 */
> +	spi = irq_flag_get_spi(env, reg);
> +	if (spi == -ERANGE)
> +		return true;

Nit: is it possible to swap is_irq_flag_reg_valid_uninit() and
     check_mem_access(), so that ERANGE special case would be not needed?

> +	if (spi < 0)
> +		return false;
> +
> +	slot = &state->stack[spi];
> +
> +	for (i = 0; i < BPF_REG_SIZE; i++)
> +		if (slot->slot_type[i] == STACK_IRQ_FLAG)
> +			return false;
> +	return true;
> +}

[...]


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf-next v3 5/7] bpf: Improve verifier log for resource leak on exit
  2024-11-27 16:58 ` [PATCH bpf-next v3 5/7] bpf: Improve verifier log for resource leak on exit Kumar Kartikeya Dwivedi
@ 2024-11-28  4:34   ` Eduard Zingerman
  0 siblings, 0 replies; 21+ messages in thread
From: Eduard Zingerman @ 2024-11-28  4:34 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Wed, 2024-11-27 at 08:58 -0800, Kumar Kartikeya Dwivedi wrote:
> The verifier log when leaking resources on BPF_EXIT may be a bit
> confusing, as it's a problem only when finally existing from the main
> prog, not from any of the subprogs. Hence, update the verifier error
> string and the corresponding selftests matching on it.
> 
> Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: Refactor {acquire,release}_reference_state
  2024-11-28  4:30     ` Kumar Kartikeya Dwivedi
@ 2024-11-28  4:36       ` Eduard Zingerman
  0 siblings, 0 replies; 21+ messages in thread
From: Eduard Zingerman @ 2024-11-28  4:36 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 2024-11-28 at 05:30 +0100, Kumar Kartikeya Dwivedi wrote:
> On Thu, 28 Nov 2024 at 05:13, Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Wed, 2024-11-27 at 08:58 -0800, Kumar Kartikeya Dwivedi wrote:
> > 
> > Overall looks good, but please take a look at a few notes below.
> > 
> > [...]
> > 
> > > @@ -1349,77 +1350,69 @@ static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state
> > >   * On success, returns a valid pointer id to associate with the register
> > >   * On failure, returns a negative errno.
> > >   */
> > > -static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx)
> > > +static struct bpf_reference_state *acquire_reference_state(struct bpf_verifier_env *env, int insn_idx, bool gen_id)
> > >  {
> > >       struct bpf_verifier_state *state = env->cur_state;
> > >       int new_ofs = state->acquired_refs;
> > > -     int id, err;
> > > +     int err;
> > > 
> > >       err = resize_reference_state(state, state->acquired_refs + 1);
> > >       if (err)
> > > -             return err;
> > > -     id = ++env->id_gen;
> > > -     state->refs[new_ofs].type = REF_TYPE_PTR;
> > > -     state->refs[new_ofs].id = id;
> > > +             return NULL;
> > > +     if (gen_id)
> > > +             state->refs[new_ofs].id = ++env->id_gen;
> > 
> > Nit: state->refs[new_ods].id might end up with garbage value if 'gen_id' is false.
> >      The resize_reference_state() uses realloc_array(),
> >      which allocates memory with GFP_KERNEL, but without __GFP_ZERO flag.
> >      This is not a problem with current patch, as you always check
> >      reference type before checking id, but most of the data strucures
> >      in verifier are zero initialized just in case.
> 
> We end up assigning to s->id if gen_id is false, e.g.
> acquire_lock_state, so I think we'll be fine without __GFP_ZERO.

Oh, I see, thank you for explaining.

[...]


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf: Introduce support for bpf_local_irq_{save,restore}
  2024-11-28  4:31   ` Eduard Zingerman
@ 2024-11-28  4:39     ` Kumar Kartikeya Dwivedi
  2024-11-28  7:26       ` Eduard Zingerman
  0 siblings, 1 reply; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-28  4:39 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 28 Nov 2024 at 05:31, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2024-11-27 at 08:58 -0800, Kumar Kartikeya Dwivedi wrote:
> > Teach the verifier about IRQ-disabled sections through the introduction
> > of two new kfuncs, bpf_local_irq_save, to save IRQ state and disable
> > them, and bpf_local_irq_restore, to restore IRQ state and enable them
> > back again.
> >
> > For the purposes of tracking the saved IRQ state, the verifier is taught
> > about a new special object on the stack of type STACK_IRQ_FLAG. This is
> > a 8 byte value which saves the IRQ flags which are to be passed back to
> > the IRQ restore kfunc.
> >
> > Renumber the enums for REF_TYPE_* to simplify the check in
> > find_lock_state, filtering out non-lock types as they grow will become
> > cumbersome and is unecessary.
> >
> > To track a dynamic number of IRQ-disabled regions and their associated
> > saved states, a new resource type RES_TYPE_IRQ is introduced, which its
> > state management functions: acquire_irq_state and release_irq_state,
> > taking advantage of the refactoring and clean ups made in earlier
> > commits.
> >
> > One notable requirement of the kernel's IRQ save and restore API is that
> > they cannot happen out of order. For this purpose, when releasing reference
> > we keep track of the prev_id we saw with REF_TYPE_IRQ. Since reference
> > states are inserted in increasing order of the index, this is used to
> > remember the ordering of acquisitions of IRQ saved states, so that we
> > maintain a logical stack in acquisition order of resource identities,
> > and can enforce LIFO ordering when restoring IRQ state. The top of the
> > stack is maintained using bpf_verifier_state's active_irq_id.
> >
> > The logic to detect initialized and unitialized irq flag slots, marking
> > and unmarking is similar to how it's done for iterators. No additional
> > checks are needed in refsafe for REF_TYPE_IRQ, apart from the usual
> > check_id satisfiability check on the ref[i].id. We have to perform the
> > same check_ids check on state->active_irq_id as well.
> >
> > The kfuncs themselves are plain wrappers over local_irq_save and
> > local_irq_restore macros.
> >
> > Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>
> Sorry, two more nits below.
>
> [...]
>
> > +static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > +{
> > +     struct bpf_func_state *state = func(env, reg);
> > +     struct bpf_stack_state *slot;
> > +     struct bpf_reg_state *st;
> > +     int spi, i, err;
> > +
> > +     spi = irq_flag_get_spi(env, reg);
> > +     if (spi < 0)
> > +             return spi;
> > +
> > +     slot = &state->stack[spi];
> > +     st = &slot->spilled_ptr;
> > +
> > +     err = release_irq_state(env->cur_state, st->ref_obj_id);
> > +     WARN_ON_ONCE(err && err != -EACCES);
> > +     if (err) {
> > +             verbose(env, "cannot restore irq state out of order\n");
>
> Nit: maybe also print acquire_irq_id and an instruction where it was acquired?

Ack. For printing the insn_idx, I guess just search in the refs array?

>
> > +             return err;
> > +     }
> > +
> > +     __mark_reg_not_init(env, st);
> > +
> > +     /* see unmark_stack_slots_dynptr() for why we need to set REG_LIVE_WRITTEN */
> > +     st->live |= REG_LIVE_WRITTEN;
> > +
> > +     for (i = 0; i < BPF_REG_SIZE; i++)
> > +             slot->slot_type[i] = STACK_INVALID;
> > +
> > +     mark_stack_slot_scratched(env, spi);
> > +     return 0;
> > +}
> > +
> > +static bool is_irq_flag_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > +{
> > +     struct bpf_func_state *state = func(env, reg);
> > +     struct bpf_stack_state *slot;
> > +     int spi, i;
> > +
> > +     /* For -ERANGE (i.e. spi not falling into allocated stack slots), we
> > +      * will do check_mem_access to check and update stack bounds later, so
> > +      * return true for that case.
> > +      */
> > +     spi = irq_flag_get_spi(env, reg);
> > +     if (spi == -ERANGE)
> > +             return true;
>
> Nit: is it possible to swap is_irq_flag_reg_valid_uninit() and
>      check_mem_access(), so that ERANGE special case would be not needed?
>

I don't think so. For dynptr, iter, irq, ERANGE indicates stack needs
to be grown, so check_mem_access will naturally do that when writing.
When not ERANGE, we need to catch cases where we have a bad slot_type.
If we overwrote it with check_mem_access, then it would scrub the slot
type as well.

When I fixed this stuff for dynptr, we had to additionally
destroy_if_dynptr_stack_slot because it wasn't required to 'release' a
dynptr when overwriting it.
Andrii made sure this was necessary for iters so now slot_type ==
STACK_ITER is just rejected instead of overwrite without a destroy
operation.
Similar idea is followed for irq flag.

Just paging in context for all this, but I may be missing if you have
something in mind.

> > +     if (spi < 0)
> > +             return false;
> > +
> > +     slot = &state->stack[spi];
> > +
> > +     for (i = 0; i < BPF_REG_SIZE; i++)
> > +             if (slot->slot_type[i] == STACK_IRQ_FLAG)
> > +                     return false;
> > +     return true;
> > +}
>
> [...]
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf: Introduce support for bpf_local_irq_{save,restore}
  2024-11-28  4:39     ` Kumar Kartikeya Dwivedi
@ 2024-11-28  7:26       ` Eduard Zingerman
  0 siblings, 0 replies; 21+ messages in thread
From: Eduard Zingerman @ 2024-11-28  7:26 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 2024-11-28 at 05:39 +0100, Kumar Kartikeya Dwivedi wrote:

[...]

> > > +static bool is_irq_flag_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > > +{
> > > +     struct bpf_func_state *state = func(env, reg);
> > > +     struct bpf_stack_state *slot;
> > > +     int spi, i;
> > > +
> > > +     /* For -ERANGE (i.e. spi not falling into allocated stack slots), we
> > > +      * will do check_mem_access to check and update stack bounds later, so
> > > +      * return true for that case.
> > > +      */
> > > +     spi = irq_flag_get_spi(env, reg);
> > > +     if (spi == -ERANGE)
> > > +             return true;
> > 
> > Nit: is it possible to swap is_irq_flag_reg_valid_uninit() and
> >      check_mem_access(), so that ERANGE special case would be not needed?
> > 
> 
> I don't think so. For dynptr, iter, irq, ERANGE indicates stack needs
> to be grown, so check_mem_access will naturally do that when writing.
> When not ERANGE, we need to catch cases where we have a bad slot_type.
> If we overwrote it with check_mem_access, then it would scrub the slot
> type as well.
> 
> When I fixed this stuff for dynptr, we had to additionally
> destroy_if_dynptr_stack_slot because it wasn't required to 'release' a
> dynptr when overwriting it.
> Andrii made sure this was necessary for iters so now slot_type ==
> STACK_ITER is just rejected instead of overwrite without a destroy
> operation.
> Similar idea is followed for irq flag.
> 
> Just paging in context for all this, but I may be missing if you have
> something in mind.

I see, makes sense. And is_dynptr_reg_valid_uninit() has the same check.
Thank you for explaining.

> > > +     if (spi < 0)
> > > +             return false;
> > > +
> > > +     slot = &state->stack[spi];
> > > +
> > > +     for (i = 0; i < BPF_REG_SIZE; i++)
> > > +             if (slot->slot_type[i] == STACK_IRQ_FLAG)
> > > +                     return false;
> > > +     return true;
> > > +}
> > 
> > [...]
> > 



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-11-28  7:26 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-27 16:58 [PATCH bpf-next v3 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
2024-11-27 16:58 ` [PATCH bpf-next v3 1/7] bpf: Consolidate locks and reference state in verifier state Kumar Kartikeya Dwivedi
2024-11-28  2:39   ` Eduard Zingerman
2024-11-28  2:54     ` Kumar Kartikeya Dwivedi
2024-11-28  3:03       ` Eduard Zingerman
2024-11-28  3:18         ` Kumar Kartikeya Dwivedi
2024-11-28  3:22           ` Eduard Zingerman
2024-11-28  3:32             ` Kumar Kartikeya Dwivedi
2024-11-27 16:58 ` [PATCH bpf-next v3 2/7] bpf: Refactor {acquire,release}_reference_state Kumar Kartikeya Dwivedi
2024-11-28  4:13   ` Eduard Zingerman
2024-11-28  4:30     ` Kumar Kartikeya Dwivedi
2024-11-28  4:36       ` Eduard Zingerman
2024-11-27 16:58 ` [PATCH bpf-next v3 3/7] bpf: Refactor mark_{dynptr,iter}_read Kumar Kartikeya Dwivedi
2024-11-27 16:58 ` [PATCH bpf-next v3 4/7] bpf: Introduce support for bpf_local_irq_{save,restore} Kumar Kartikeya Dwivedi
2024-11-28  4:31   ` Eduard Zingerman
2024-11-28  4:39     ` Kumar Kartikeya Dwivedi
2024-11-28  7:26       ` Eduard Zingerman
2024-11-27 16:58 ` [PATCH bpf-next v3 5/7] bpf: Improve verifier log for resource leak on exit Kumar Kartikeya Dwivedi
2024-11-28  4:34   ` Eduard Zingerman
2024-11-27 16:58 ` [PATCH bpf-next v3 6/7] selftests/bpf: Expand coverage of preempt tests to sleepable kfunc Kumar Kartikeya Dwivedi
2024-11-27 16:58 ` [PATCH bpf-next v3 7/7] selftests/bpf: Add IRQ save/restore tests Kumar Kartikeya Dwivedi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox