[PATCH bpf-next v1 0/7] IRQ save/restore

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf-next v1 0/7] IRQ save/restore
@ 2024-11-21  0:53 Kumar Kartikeya Dwivedi
  2024-11-21  0:53 ` [PATCH bpf-next v1 1/7] bpf: Refactor and rename resource management Kumar Kartikeya Dwivedi
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-21  0:53 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kernel-team

This set introduces support for managing IRQ state from BPF programs.
Two new kfuncs, bpf_local_irq_save, and bpf_local_irq_restore are
introduced to enable this functionality.

Intended use cases are writing IRQ safe data structures (e.g. memory
allocator) in BPF programs natively, and use in new spin locking
primitives intended to be introduced in the next few weeks.

The set begins with some refactoring patches before the actual
functionality is introduced. First, we rename bpf_reference_state to
bpf_resource_state, and stick to calling pointer resources as actual
references, as the meaning is right now mixed (with lock state
management functions calling acquire_reference_state and
check_reference_leak not considering lock resoures, causing confusion).

The first patch also takes this opportunity to eliminate code
duplication where possible. The second patch resolves an inconsistency
in the same set of functions for managing lock state, in preparation for
later patches that require passing bpf_func_state explicitly to find a
lock state.

Since we now manage locks in bpf_func_state alongside other resources,
this is a good opportunity to consolidate all other resources like RCU
read sections and preempt-disabled section tracking into bpf_func_state
as well, which is what patch 3 achieves.

After this, patch 4 refactors stack slot liveness marking logic to be
shared between dynptr, and iterators, in preparation for introducing
same logic for irq flag object on stack.

Finally, patch 5 and 7 introduce the new kfuncs and their selftests. For
more details, please inspect the patch commit logs. Patch 6 expands
coverage of existing preempt-disable selftest to cover sleepable kfuncs.

Kumar Kartikeya Dwivedi (7):
  bpf: Refactor and rename resource management
  bpf: Be consistent between {acquire,find,release}_lock_state
  bpf: Consolidate RCU and preempt locks in bpf_func_state
  bpf: Refactor mark_{dynptr,iter}_read
  bpf: Introduce support for bpf_local_irq_{save,restore}
  selftests/bpf: Expand coverage of preempt tests to sleepable kfunc
  selftests/bpf: Add IRQ save/restore tests

 include/linux/bpf_verifier.h                  |  47 +-
 kernel/bpf/helpers.c                          |  24 +
 kernel/bpf/log.c                              |  11 +-
 kernel/bpf/verifier.c                         | 540 ++++++++++++++----
 tools/testing/selftests/bpf/prog_tests/irq.c  |   9 +
 tools/testing/selftests/bpf/progs/irq.c       | 393 +++++++++++++
 .../selftests/bpf/progs/preempt_lock.c        |  12 +
 7 files changed, 889 insertions(+), 147 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/irq.c
 create mode 100644 tools/testing/selftests/bpf/progs/irq.c

base-commit: 2c8b09ac2537299511c898bc71b1a5f2756c831c
-- 
2.43.5

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v1 1/7] bpf: Refactor and rename resource management
  2024-11-21  0:53 [PATCH bpf-next v1 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
@ 2024-11-21  0:53 ` Kumar Kartikeya Dwivedi
  2024-11-21 16:57   ` Eduard Zingerman
  2024-11-22  0:24   ` Alexei Starovoitov
  2024-11-21  0:53 ` [PATCH bpf-next v1 2/7] bpf: Be consistent between {acquire,find,release}_lock_state Kumar Kartikeya Dwivedi
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-21  0:53 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kernel-team

With the commit f6b9a69a9e56 ("bpf: Refactor active lock management"),
we have begun using the acquired_refs array to also store active lock
metadata, as a way to consolidate and manage all kernel resources that
the program may acquire.

This is beginning to cause some confusion and duplication in existing
code, where the terms references now both mean lock reference state and
the references for acquired kernel object pointers. To clarify and
improve the current state of affairs, as well as reduce code duplication,
make the following changes:

Rename bpf_reference_state to bpf_resource_state, and begin using
resource as the umbrella term. This terminology matches what we use in
check_resource_leak. Next, "reference" now only means RES_TYPE_PTR, and
the usage and meaning is updated accordingly.

Next, factor out common code paths for managing addition and removal of
resource state in acquire_resource_state and erase_resource_state, and
then implement type specific resource handling on top of these common
functions. Overall, this patch improves upon the confusion and minimizes
code duplication, as we prepare to introduce new resource types in
subsequent patches.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |  24 +++--
 kernel/bpf/log.c             |  10 +-
 kernel/bpf/verifier.c        | 173 +++++++++++++++++++----------------
 3 files changed, 108 insertions(+), 99 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index f4290c179bee..e5123b6804eb 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -249,20 +249,18 @@ struct bpf_stack_state {
 	u8 slot_type[BPF_REG_SIZE];
 };
 
-struct bpf_reference_state {
-	/* Each reference object has a type. Ensure REF_TYPE_PTR is zero to
-	 * default to pointer reference on zero initialization of a state.
-	 */
-	enum ref_state_type {
-		REF_TYPE_PTR = 0,
-		REF_TYPE_LOCK,
+struct bpf_resource_state {
+	enum res_state_type {
+		RES_TYPE_INV = -1,
+		RES_TYPE_PTR = 0,
+		RES_TYPE_LOCK,
 	} type;
-	/* Track each reference created with a unique id, even if the same
-	 * instruction creates the reference multiple times (eg, via CALL).
+	/* Track each resource created with a unique id, even if the same
+	 * instruction creates the resource multiple times (eg, via CALL).
 	 */
 	int id;
-	/* Instruction where the allocation of this reference occurred. This
-	 * is used purely to inform the user of a reference leak.
+	/* Instruction where the allocation of this resource occurred. This
+	 * is used purely to inform the user of a resource leak.
 	 */
 	int insn_idx;
 	/* Use to keep track of the source object of a lock, to ensure
@@ -315,9 +313,9 @@ struct bpf_func_state {
 	u32 callback_depth;
 
 	/* The following fields should be last. See copy_func_state() */
-	int acquired_refs;
+	int acquired_res;
 	int active_locks;
-	struct bpf_reference_state *refs;
+	struct bpf_resource_state *res;
 	/* The state of the stack. Each element of the array describes BPF_REG_SIZE
 	 * (i.e. 8) bytes worth of stack memory.
 	 * stack[0] represents bytes [*(r10-8)..*(r10-1)]
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index 4a858fdb6476..0ad6f0737c57 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -843,11 +843,11 @@ void print_verifier_state(struct bpf_verifier_env *env, const struct bpf_func_st
 			break;
 		}
 	}
-	if (state->acquired_refs && state->refs[0].id) {
-		verbose(env, " refs=%d", state->refs[0].id);
-		for (i = 1; i < state->acquired_refs; i++)
-			if (state->refs[i].id)
-				verbose(env, ",%d", state->refs[i].id);
+	if (state->acquired_res && state->res[0].id) {
+		verbose(env, " refs=%d", state->res[0].id);
+		for (i = 1; i < state->acquired_res; i++)
+			if (state->res[i].id)
+				verbose(env, ",%d", state->res[i].id);
 	}
 	if (state->in_callback_fn)
 		verbose(env, " cb");
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1c4ebb326785..c106720d0c62 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1279,15 +1279,15 @@ static void *realloc_array(void *arr, size_t old_n, size_t new_n, size_t size)
 	return arr ? arr : ZERO_SIZE_PTR;
 }
 
-static int copy_reference_state(struct bpf_func_state *dst, const struct bpf_func_state *src)
+static int copy_resource_state(struct bpf_func_state *dst, const struct bpf_func_state *src)
 {
-	dst->refs = copy_array(dst->refs, src->refs, src->acquired_refs,
-			       sizeof(struct bpf_reference_state), GFP_KERNEL);
-	if (!dst->refs)
+	dst->res = copy_array(dst->res, src->res, src->acquired_res,
+			      sizeof(struct bpf_resource_state), GFP_KERNEL);
+	if (!dst->res)
 		return -ENOMEM;
 
+	dst->acquired_res = src->acquired_res;
 	dst->active_locks = src->active_locks;
-	dst->acquired_refs = src->acquired_refs;
 	return 0;
 }
 
@@ -1304,14 +1304,14 @@ static int copy_stack_state(struct bpf_func_state *dst, const struct bpf_func_st
 	return 0;
 }
 
-static int resize_reference_state(struct bpf_func_state *state, size_t n)
+static int resize_resource_state(struct bpf_func_state *state, size_t n)
 {
-	state->refs = realloc_array(state->refs, state->acquired_refs, n,
-				    sizeof(struct bpf_reference_state));
-	if (!state->refs)
+	state->res = realloc_array(state->res, state->acquired_res, n,
+				   sizeof(struct bpf_resource_state));
+	if (!state->res)
 		return -ENOMEM;
 
-	state->acquired_refs = n;
+	state->acquired_res = n;
 	return 0;
 }
 
@@ -1342,6 +1342,25 @@ static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state
 	return 0;
 }
 
+static struct bpf_resource_state *acquire_resource_state(struct bpf_verifier_env *env, int insn_idx, int *id)
+{
+	struct bpf_func_state *state = cur_func(env);
+	int new_ofs = state->acquired_res;
+	struct bpf_resource_state *s;
+	int err;
+
+	err = resize_resource_state(state, state->acquired_res + 1);
+	if (err)
+		return NULL;
+	s = &state->res[new_ofs];
+	s->type = RES_TYPE_INV;
+	if (id)
+		*id = s->id = ++env->id_gen;
+	s->insn_idx = insn_idx;
+
+	return s;
+}
+
 /* Acquire a pointer id from the env and update the state->refs to include
  * this new pointer reference.
  * On success, returns a valid pointer id to associate with the register
@@ -1349,55 +1368,52 @@ static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state
  */
 static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx)
 {
-	struct bpf_func_state *state = cur_func(env);
-	int new_ofs = state->acquired_refs;
-	int id, err;
-
-	err = resize_reference_state(state, state->acquired_refs + 1);
-	if (err)
-		return err;
-	id = ++env->id_gen;
-	state->refs[new_ofs].type = REF_TYPE_PTR;
-	state->refs[new_ofs].id = id;
-	state->refs[new_ofs].insn_idx = insn_idx;
+	struct bpf_resource_state *s;
+	int id;
 
+	s = acquire_resource_state(env, insn_idx, &id);
+	if (!s)
+		return -ENOMEM;
+	s->type = RES_TYPE_PTR;
 	return id;
 }
 
-static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum ref_state_type type,
+static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum res_state_type type,
 			      int id, void *ptr)
 {
 	struct bpf_func_state *state = cur_func(env);
-	int new_ofs = state->acquired_refs;
-	int err;
+	struct bpf_resource_state *s;
 
-	err = resize_reference_state(state, state->acquired_refs + 1);
-	if (err)
-		return err;
-	state->refs[new_ofs].type = type;
-	state->refs[new_ofs].id = id;
-	state->refs[new_ofs].insn_idx = insn_idx;
-	state->refs[new_ofs].ptr = ptr;
+	s = acquire_resource_state(env, insn_idx, NULL);
+	if (!s)
+		return -ENOMEM;
+	s->type = type;
+	s->id = id;
+	s->ptr = ptr;
 
 	state->active_locks++;
 	return 0;
 }
 
-/* release function corresponding to acquire_reference_state(). Idempotent. */
+static void erase_resource_state(struct bpf_func_state *state, int res_idx)
+{
+	int last_idx = state->acquired_res - 1;
+
+	if (last_idx && res_idx != last_idx)
+		memcpy(&state->res[res_idx], &state->res[last_idx], sizeof(*state->res));
+	memset(&state->res[last_idx], 0, sizeof(*state->res));
+	state->acquired_res--;
+}
+
 static int release_reference_state(struct bpf_func_state *state, int ptr_id)
 {
-	int i, last_idx;
+	int i;
 
-	last_idx = state->acquired_refs - 1;
-	for (i = 0; i < state->acquired_refs; i++) {
-		if (state->refs[i].type != REF_TYPE_PTR)
+	for (i = 0; i < state->acquired_res; i++) {
+		if (state->res[i].type != RES_TYPE_PTR)
 			continue;
-		if (state->refs[i].id == ptr_id) {
-			if (last_idx && i != last_idx)
-				memcpy(&state->refs[i], &state->refs[last_idx],
-				       sizeof(*state->refs));
-			memset(&state->refs[last_idx], 0, sizeof(*state->refs));
-			state->acquired_refs--;
+		if (state->res[i].id == ptr_id) {
+			erase_resource_state(state, i);
 			return 0;
 		}
 	}
@@ -1406,18 +1422,13 @@ static int release_reference_state(struct bpf_func_state *state, int ptr_id)
 
 static int release_lock_state(struct bpf_func_state *state, int type, int id, void *ptr)
 {
-	int i, last_idx;
+	int i;
 
-	last_idx = state->acquired_refs - 1;
-	for (i = 0; i < state->acquired_refs; i++) {
-		if (state->refs[i].type != type)
+	for (i = 0; i < state->acquired_res; i++) {
+		if (state->res[i].type != type)
 			continue;
-		if (state->refs[i].id == id && state->refs[i].ptr == ptr) {
-			if (last_idx && i != last_idx)
-				memcpy(&state->refs[i], &state->refs[last_idx],
-				       sizeof(*state->refs));
-			memset(&state->refs[last_idx], 0, sizeof(*state->refs));
-			state->acquired_refs--;
+		if (state->res[i].id == id && state->res[i].ptr == ptr) {
+			erase_resource_state(state, i);
 			state->active_locks--;
 			return 0;
 		}
@@ -1425,16 +1436,16 @@ static int release_lock_state(struct bpf_func_state *state, int type, int id, vo
 	return -EINVAL;
 }
 
-static struct bpf_reference_state *find_lock_state(struct bpf_verifier_env *env, enum ref_state_type type,
+static struct bpf_resource_state *find_lock_state(struct bpf_verifier_env *env, enum res_state_type type,
 						   int id, void *ptr)
 {
 	struct bpf_func_state *state = cur_func(env);
 	int i;
 
-	for (i = 0; i < state->acquired_refs; i++) {
-		struct bpf_reference_state *s = &state->refs[i];
+	for (i = 0; i < state->acquired_res; i++) {
+		struct bpf_resource_state *s = &state->res[i];
 
-		if (s->type == REF_TYPE_PTR || s->type != type)
+		if (s->type == RES_TYPE_PTR || s->type != type)
 			continue;
 
 		if (s->id == id && s->ptr == ptr)
@@ -1447,7 +1458,7 @@ static void free_func_state(struct bpf_func_state *state)
 {
 	if (!state)
 		return;
-	kfree(state->refs);
+	kfree(state->res);
 	kfree(state->stack);
 	kfree(state);
 }
@@ -1473,8 +1484,8 @@ static int copy_func_state(struct bpf_func_state *dst,
 {
 	int err;
 
-	memcpy(dst, src, offsetof(struct bpf_func_state, acquired_refs));
-	err = copy_reference_state(dst, src);
+	memcpy(dst, src, offsetof(struct bpf_func_state, acquired_res));
+	err = copy_resource_state(dst, src);
 	if (err)
 		return err;
 	return copy_stack_state(dst, src);
@@ -7907,7 +7918,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 				"Locking two bpf_spin_locks are not allowed\n");
 			return -EINVAL;
 		}
-		err = acquire_lock_state(env, env->insn_idx, REF_TYPE_LOCK, reg->id, ptr);
+		err = acquire_lock_state(env, env->insn_idx, RES_TYPE_LOCK, reg->id, ptr);
 		if (err < 0) {
 			verbose(env, "Failed to acquire lock state\n");
 			return err;
@@ -7925,7 +7936,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			return -EINVAL;
 		}
 
-		if (release_lock_state(cur_func(env), REF_TYPE_LOCK, reg->id, ptr)) {
+		if (release_lock_state(cur_func(env), RES_TYPE_LOCK, reg->id, ptr)) {
 			verbose(env, "bpf_spin_unlock of different lock\n");
 			return -EINVAL;
 		}
@@ -9758,7 +9769,7 @@ static int setup_func_entry(struct bpf_verifier_env *env, int subprog, int calls
 			state->curframe + 1 /* frameno within this callchain */,
 			subprog /* subprog number within this prog */);
 	/* Transfer references to the callee */
-	err = copy_reference_state(callee, caller);
+	err = copy_resource_state(callee, caller);
 	err = err ?: set_callee_state_cb(env, caller, callee, callsite);
 	if (err)
 		goto err_out;
@@ -10334,7 +10345,7 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 	}
 
 	/* Transfer references to the caller */
-	err = copy_reference_state(caller, callee);
+	err = copy_resource_state(caller, callee);
 	if (err)
 		return err;
 
@@ -10509,11 +10520,11 @@ static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exi
 	if (!exception_exit && state->frameno)
 		return 0;
 
-	for (i = 0; i < state->acquired_refs; i++) {
-		if (state->refs[i].type != REF_TYPE_PTR)
+	for (i = 0; i < state->acquired_res; i++) {
+		if (state->res[i].type != RES_TYPE_PTR)
 			continue;
 		verbose(env, "Unreleased reference id=%d alloc_insn=%d\n",
-			state->refs[i].id, state->refs[i].insn_idx);
+			state->res[i].id, state->res[i].insn_idx);
 		refs_lingering = true;
 	}
 	return refs_lingering ? -EINVAL : 0;
@@ -11777,8 +11788,8 @@ static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_o
 		return -EFAULT;
 	}
 
-	for (i = 0; i < state->acquired_refs; i++) {
-		if (state->refs[i].id != ref_obj_id)
+	for (i = 0; i < state->acquired_res; i++) {
+		if (state->res[i].id != ref_obj_id)
 			continue;
 
 		/* Clear ref_obj_id here so release_reference doesn't clobber
@@ -11843,7 +11854,7 @@ static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_o
  */
 static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
 {
-	struct bpf_reference_state *s;
+	struct bpf_resource_state *s;
 	void *ptr;
 	u32 id;
 
@@ -11862,7 +11873,7 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_
 
 	if (!cur_func(env)->active_locks)
 		return -EINVAL;
-	s = find_lock_state(env, REF_TYPE_LOCK, id, ptr);
+	s = find_lock_state(env, RES_TYPE_LOCK, id, ptr);
 	if (!s) {
 		verbose(env, "held lock and object are not in the same allocation\n");
 		return -EINVAL;
@@ -17750,27 +17761,27 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
 	return true;
 }
 
-static bool refsafe(struct bpf_func_state *old, struct bpf_func_state *cur,
+static bool ressafe(struct bpf_func_state *old, struct bpf_func_state *cur,
 		    struct bpf_idmap *idmap)
 {
 	int i;
 
-	if (old->acquired_refs != cur->acquired_refs)
+	if (old->acquired_res != cur->acquired_res)
 		return false;
 
-	for (i = 0; i < old->acquired_refs; i++) {
-		if (!check_ids(old->refs[i].id, cur->refs[i].id, idmap) ||
-		    old->refs[i].type != cur->refs[i].type)
+	for (i = 0; i < old->acquired_res; i++) {
+		if (!check_ids(old->res[i].id, cur->res[i].id, idmap) ||
+		    old->res[i].type != cur->res[i].type)
 			return false;
-		switch (old->refs[i].type) {
-		case REF_TYPE_PTR:
+		switch (old->res[i].type) {
+		case RES_TYPE_PTR:
 			break;
-		case REF_TYPE_LOCK:
-			if (old->refs[i].ptr != cur->refs[i].ptr)
+		case RES_TYPE_LOCK:
+			if (old->res[i].ptr != cur->res[i].ptr)
 				return false;
 			break;
 		default:
-			WARN_ONCE(1, "Unhandled enum type for reference state: %d\n", old->refs[i].type);
+			WARN_ONCE(1, "Unhandled enum type for resource state: %d\n", old->res[i].type);
 			return false;
 		}
 	}
@@ -17820,7 +17831,7 @@ static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_stat
 	if (!stacksafe(env, old, cur, &env->idmap_scratch, exact))
 		return false;
 
-	if (!refsafe(old, cur, &env->idmap_scratch))
+	if (!ressafe(old, cur, &env->idmap_scratch))
 		return false;
 
 	return true;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 1/7] bpf: Refactor and rename resource management
  2024-11-21  0:53 ` [PATCH bpf-next v1 1/7] bpf: Refactor and rename resource management Kumar Kartikeya Dwivedi
@ 2024-11-21 16:57   ` Eduard Zingerman
  2024-11-21 17:17     ` Kumar Kartikeya Dwivedi
  2024-11-22  0:24   ` Alexei Starovoitov
  1 sibling, 1 reply; 30+ messages in thread
From: Eduard Zingerman @ 2024-11-21 16:57 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> With the commit f6b9a69a9e56 ("bpf: Refactor active lock management"),
> we have begun using the acquired_refs array to also store active lock
> metadata, as a way to consolidate and manage all kernel resources that
> the program may acquire.
> 
> This is beginning to cause some confusion and duplication in existing
> code, where the terms references now both mean lock reference state and
> the references for acquired kernel object pointers. To clarify and
> improve the current state of affairs, as well as reduce code duplication,
> make the following changes:
> 
> Rename bpf_reference_state to bpf_resource_state, and begin using
> resource as the umbrella term. This terminology matches what we use in
> check_resource_leak. Next, "reference" now only means RES_TYPE_PTR, and
> the usage and meaning is updated accordingly.
> 
> Next, factor out common code paths for managing addition and removal of
> resource state in acquire_resource_state and erase_resource_state, and
> then implement type specific resource handling on top of these common
> functions. Overall, this patch improves upon the confusion and minimizes
> code duplication, as we prepare to introduce new resource types in
> subsequent patches.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Tbh, I like the old name a bit more.
The patch itself looks good.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> @@ -1342,6 +1342,25 @@ static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state
>  	return 0;
>  }
>  
> +static struct bpf_resource_state *acquire_resource_state(struct bpf_verifier_env *env, int insn_idx, int *id)

Nit: there is no need to pass `int *id`, as it is available as (returned)->id.

> +{
> +	struct bpf_func_state *state = cur_func(env);
> +	int new_ofs = state->acquired_res;
> +	struct bpf_resource_state *s;
> +	int err;
> +
> +	err = resize_resource_state(state, state->acquired_res + 1);
> +	if (err)
> +		return NULL;
> +	s = &state->res[new_ofs];
> +	s->type = RES_TYPE_INV;
> +	if (id)
> +		*id = s->id = ++env->id_gen;
> +	s->insn_idx = insn_idx;
> +
> +	return s;
> +}
> +
>  /* Acquire a pointer id from the env and update the state->refs to include
>   * this new pointer reference.
>   * On success, returns a valid pointer id to associate with the register

[...]

> @@ -1349,55 +1368,52 @@ static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state

[...]

> -/* release function corresponding to acquire_reference_state(). Idempotent. */
> +static void erase_resource_state(struct bpf_func_state *state, int res_idx)

Nit: why not "release_..." to be consistent with the rest of the functions?

> +{
> +	int last_idx = state->acquired_res - 1;
> +
> +	if (last_idx && res_idx != last_idx)
> +		memcpy(&state->res[res_idx], &state->res[last_idx], sizeof(*state->res));
> +	memset(&state->res[last_idx], 0, sizeof(*state->res));
> +	state->acquired_res--;
> +}
> +
>  static int release_reference_state(struct bpf_func_state *state, int ptr_id)
>  {
> -	int i, last_idx;
> +	int i;
>  
> -	last_idx = state->acquired_refs - 1;
> -	for (i = 0; i < state->acquired_refs; i++) {
> -		if (state->refs[i].type != REF_TYPE_PTR)
> +	for (i = 0; i < state->acquired_res; i++) {
> +		if (state->res[i].type != RES_TYPE_PTR)
>  			continue;
> -		if (state->refs[i].id == ptr_id) {
> -			if (last_idx && i != last_idx)
> -				memcpy(&state->refs[i], &state->refs[last_idx],
> -				       sizeof(*state->refs));
> -			memset(&state->refs[last_idx], 0, sizeof(*state->refs));
> -			state->acquired_refs--;
> +		if (state->res[i].id == ptr_id) {
> +			erase_resource_state(state, i);
>  			return 0;
>  		}
>  	}

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 1/7] bpf: Refactor and rename resource management
  2024-11-21 16:57   ` Eduard Zingerman
@ 2024-11-21 17:17     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-21 17:17 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 21 Nov 2024 at 17:57, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> > With the commit f6b9a69a9e56 ("bpf: Refactor active lock management"),
> > we have begun using the acquired_refs array to also store active lock
> > metadata, as a way to consolidate and manage all kernel resources that
> > the program may acquire.
> >
> > This is beginning to cause some confusion and duplication in existing
> > code, where the terms references now both mean lock reference state and
> > the references for acquired kernel object pointers. To clarify and
> > improve the current state of affairs, as well as reduce code duplication,
> > make the following changes:
> >
> > Rename bpf_reference_state to bpf_resource_state, and begin using
> > resource as the umbrella term. This terminology matches what we use in
> > check_resource_leak. Next, "reference" now only means RES_TYPE_PTR, and
> > the usage and meaning is updated accordingly.
> >
> > Next, factor out common code paths for managing addition and removal of
> > resource state in acquire_resource_state and erase_resource_state, and
> > then implement type specific resource handling on top of these common
> > functions. Overall, this patch improves upon the confusion and minimizes
> > code duplication, as we prepare to introduce new resource types in
> > subsequent patches.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> Tbh, I like the old name a bit more.
> The patch itself looks good.
>

I am happy for suggestions on better naming, but it would be better to
make a distinction somehow.

> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
> > @@ -1342,6 +1342,25 @@ static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state
> >       return 0;
> >  }
> >
> > +static struct bpf_resource_state *acquire_resource_state(struct bpf_verifier_env *env, int insn_idx, int *id)
>
> Nit: there is no need to pass `int *id`, as it is available as (returned)->id.
>

Replaced with a bool alloc_id to decide whether it generates a new id
or not, and fixed.

> > +{
> > +     struct bpf_func_state *state = cur_func(env);
> > +     int new_ofs = state->acquired_res;
> > +     struct bpf_resource_state *s;
> > +     int err;
> > +
> > +     err = resize_resource_state(state, state->acquired_res + 1);
> > +     if (err)
> > +             return NULL;
> > +     s = &state->res[new_ofs];
> > +     s->type = RES_TYPE_INV;
> > +     if (id)
> > +             *id = s->id = ++env->id_gen;
> > +     s->insn_idx = insn_idx;
> > +
> > +     return s;
> > +}
> > +
> >  /* Acquire a pointer id from the env and update the state->refs to include
> >   * this new pointer reference.
> >   * On success, returns a valid pointer id to associate with the register
>
> [...]
>
> > @@ -1349,55 +1368,52 @@ static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state
>
> [...]
>
> > -/* release function corresponding to acquire_reference_state(). Idempotent. */
> > +static void erase_resource_state(struct bpf_func_state *state, int res_idx)
>
> Nit: why not "release_..." to be consistent with the rest of the functions?
>

This was a subset of what "release_resource_state" would have done,
since it erases a res_idx,
but on second thought, it's probably better to rename, so fixed as well.

Thanks for the review.

> > +{
> > +     int last_idx = state->acquired_res - 1;
> > +
> > +     if (last_idx && res_idx != last_idx)
> > +             memcpy(&state->res[res_idx], &state->res[last_idx], sizeof(*state->res));
> > +     memset(&state->res[last_idx], 0, sizeof(*state->res));
> > +     state->acquired_res--;
> > +}
> > +
> >  static int release_reference_state(struct bpf_func_state *state, int ptr_id)
> >  {
> > -     int i, last_idx;
> > +     int i;
> >
> > -     last_idx = state->acquired_refs - 1;
> > -     for (i = 0; i < state->acquired_refs; i++) {
> > -             if (state->refs[i].type != REF_TYPE_PTR)
> > +     for (i = 0; i < state->acquired_res; i++) {
> > +             if (state->res[i].type != RES_TYPE_PTR)
> >                       continue;
> > -             if (state->refs[i].id == ptr_id) {
> > -                     if (last_idx && i != last_idx)
> > -                             memcpy(&state->refs[i], &state->refs[last_idx],
> > -                                    sizeof(*state->refs));
> > -                     memset(&state->refs[last_idx], 0, sizeof(*state->refs));
> > -                     state->acquired_refs--;
> > +             if (state->res[i].id == ptr_id) {
> > +                     erase_resource_state(state, i);
> >                       return 0;
> >               }
> >       }
>
> [...]
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 1/7] bpf: Refactor and rename resource management
  2024-11-21  0:53 ` [PATCH bpf-next v1 1/7] bpf: Refactor and rename resource management Kumar Kartikeya Dwivedi
  2024-11-21 16:57   ` Eduard Zingerman
@ 2024-11-22  0:24   ` Alexei Starovoitov
  2024-11-22  0:31     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2024-11-22  0:24 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Kernel Team

On Wed, Nov 20, 2024 at 4:53 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> With the commit f6b9a69a9e56 ("bpf: Refactor active lock management"),
> we have begun using the acquired_refs array to also store active lock
> metadata, as a way to consolidate and manage all kernel resources that
> the program may acquire.
>
> This is beginning to cause some confusion and duplication in existing
> code, where the terms references now both mean lock reference state and
> the references for acquired kernel object pointers. To clarify and
> improve the current state of affairs, as well as reduce code duplication,
> make the following changes:
>
> Rename bpf_reference_state to bpf_resource_state, and begin using
> resource as the umbrella term. This terminology matches what we use in
> check_resource_leak. Next, "reference" now only means RES_TYPE_PTR, and
> the usage and meaning is updated accordingly.


Sorry I don't like this renaming.
reference state is already understood as a set of resources that
were acquired.
Whether it's an object allocated by bpf_obj_new or any other
resource.
I think this patch has a net negative effect.
People familiar with the verifier already understand what
refsafe() or acquired_refs are for.
Calling them slightly different names adds confusion, not clarity.

pw-bot: cr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 1/7] bpf: Refactor and rename resource management
  2024-11-22  0:24   ` Alexei Starovoitov
@ 2024-11-22  0:31     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-22  0:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Kernel Team

On Fri, 22 Nov 2024 at 01:24, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Nov 20, 2024 at 4:53 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > With the commit f6b9a69a9e56 ("bpf: Refactor active lock management"),
> > we have begun using the acquired_refs array to also store active lock
> > metadata, as a way to consolidate and manage all kernel resources that
> > the program may acquire.
> >
> > This is beginning to cause some confusion and duplication in existing
> > code, where the terms references now both mean lock reference state and
> > the references for acquired kernel object pointers. To clarify and
> > improve the current state of affairs, as well as reduce code duplication,
> > make the following changes:
> >
> > Rename bpf_reference_state to bpf_resource_state, and begin using
> > resource as the umbrella term. This terminology matches what we use in
> > check_resource_leak. Next, "reference" now only means RES_TYPE_PTR, and
> > the usage and meaning is updated accordingly.
>
>
> Sorry I don't like this renaming.
> reference state is already understood as a set of resources that
> were acquired.
> Whether it's an object allocated by bpf_obj_new or any other
> resource.
> I think this patch has a net negative effect.
> People familiar with the verifier already understand what
> refsafe() or acquired_refs are for.
> Calling them slightly different names adds confusion, not clarity.
>
> pw-bot: cr

Ok

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v1 2/7] bpf: Be consistent between {acquire,find,release}_lock_state
  2024-11-21  0:53 [PATCH bpf-next v1 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
  2024-11-21  0:53 ` [PATCH bpf-next v1 1/7] bpf: Refactor and rename resource management Kumar Kartikeya Dwivedi
@ 2024-11-21  0:53 ` Kumar Kartikeya Dwivedi
  2024-11-21 17:54   ` Eduard Zingerman
  2024-11-21  0:53 ` [PATCH bpf-next v1 3/7] bpf: Consolidate RCU and preempt locks in bpf_func_state Kumar Kartikeya Dwivedi
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-21  0:53 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kernel-team

Both acquire_lock_state and release_lock_state take the bpf_func_state
as a parameter, while find_lock_state does not. Future patches will end
up requiring operating on non-cur_func(env) bpf_func_state (for
resilient locks), hence just make the prototype consistent and take
bpf_func_state directly.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c106720d0c62..0ff436c06c13 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1436,10 +1436,9 @@ static int release_lock_state(struct bpf_func_state *state, int type, int id, vo
 	return -EINVAL;
 }
 
-static struct bpf_resource_state *find_lock_state(struct bpf_verifier_env *env, enum res_state_type type,
+static struct bpf_resource_state *find_lock_state(struct bpf_func_state *state, enum res_state_type type,
 						   int id, void *ptr)
 {
-	struct bpf_func_state *state = cur_func(env);
 	int i;
 
 	for (i = 0; i < state->acquired_res; i++) {
@@ -11873,7 +11872,7 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_
 
 	if (!cur_func(env)->active_locks)
 		return -EINVAL;
-	s = find_lock_state(env, RES_TYPE_LOCK, id, ptr);
+	s = find_lock_state(cur_func(env), RES_TYPE_LOCK, id, ptr);
 	if (!s) {
 		verbose(env, "held lock and object are not in the same allocation\n");
 		return -EINVAL;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 2/7] bpf: Be consistent between {acquire,find,release}_lock_state
  2024-11-21  0:53 ` [PATCH bpf-next v1 2/7] bpf: Be consistent between {acquire,find,release}_lock_state Kumar Kartikeya Dwivedi
@ 2024-11-21 17:54   ` Eduard Zingerman
  0 siblings, 0 replies; 30+ messages in thread
From: Eduard Zingerman @ 2024-11-21 17:54 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> Both acquire_lock_state and release_lock_state take the bpf_func_state
> as a parameter, while find_lock_state does not. Future patches will end
> up requiring operating on non-cur_func(env) bpf_func_state (for
> resilient locks), hence just make the prototype consistent and take
> bpf_func_state directly.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v1 3/7] bpf: Consolidate RCU and preempt locks in bpf_func_state
  2024-11-21  0:53 [PATCH bpf-next v1 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
  2024-11-21  0:53 ` [PATCH bpf-next v1 1/7] bpf: Refactor and rename resource management Kumar Kartikeya Dwivedi
  2024-11-21  0:53 ` [PATCH bpf-next v1 2/7] bpf: Be consistent between {acquire,find,release}_lock_state Kumar Kartikeya Dwivedi
@ 2024-11-21  0:53 ` Kumar Kartikeya Dwivedi
  2024-11-21 18:09   ` Eduard Zingerman
  2024-11-21  0:53 ` [PATCH bpf-next v1 4/7] bpf: Refactor mark_{dynptr,iter}_read Kumar Kartikeya Dwivedi
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-21  0:53 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kernel-team

To ensure consistency in resource handling, move RCU and preemption
state counters to bpf_func_state, and convert all users to access them
through cur_func(env).

For the sake of consistency, also compare active_locks in ressafe as a
quick way to eliminate iteration and entry matching if the number of
locks are not the same.

OTOH, the comparison of active_preempt_locks and active_rcu_lock is
needed for correctness, as state exploration cannot be avoided if these
counters do not match, and not comparing them will lead to problems
since they lack an actual entry in the acquired_res array.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |  4 ++--
 kernel/bpf/verifier.c        | 46 ++++++++++++++++++++----------------
 2 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index e5123b6804eb..fa09538a35bc 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -315,6 +315,8 @@ struct bpf_func_state {
 	/* The following fields should be last. See copy_func_state() */
 	int acquired_res;
 	int active_locks;
+	int active_preempt_locks;
+	bool active_rcu_lock;
 	struct bpf_resource_state *res;
 	/* The state of the stack. Each element of the array describes BPF_REG_SIZE
 	 * (i.e. 8) bytes worth of stack memory.
@@ -418,8 +420,6 @@ struct bpf_verifier_state {
 	u32 curframe;
 
 	bool speculative;
-	bool active_rcu_lock;
-	u32 active_preempt_lock;
 	/* If this state was ever pointed-to by other state's loop_entry field
 	 * this flag would be set to true. Used to avoid freeing such states
 	 * while they are still in use.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0ff436c06c13..25c44b68f16a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1287,7 +1287,10 @@ static int copy_resource_state(struct bpf_func_state *dst, const struct bpf_func
 		return -ENOMEM;
 
 	dst->acquired_res = src->acquired_res;
+
 	dst->active_locks = src->active_locks;
+	dst->active_preempt_locks = src->active_preempt_locks;
+	dst->active_rcu_lock = src->active_rcu_lock;
 	return 0;
 }
 
@@ -1504,8 +1507,6 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 		dst_state->frame[i] = NULL;
 	}
 	dst_state->speculative = src->speculative;
-	dst_state->active_rcu_lock = src->active_rcu_lock;
-	dst_state->active_preempt_lock = src->active_preempt_lock;
 	dst_state->in_sleepable = src->in_sleepable;
 	dst_state->curframe = src->curframe;
 	dst_state->branches = src->branches;
@@ -5505,7 +5506,7 @@ static bool in_sleepable(struct bpf_verifier_env *env)
  */
 static bool in_rcu_cs(struct bpf_verifier_env *env)
 {
-	return env->cur_state->active_rcu_lock ||
+	return cur_func(env)->active_rcu_lock ||
 	       cur_func(env)->active_locks ||
 	       !in_sleepable(env);
 }
@@ -10009,7 +10010,7 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		}
 
 		/* Only global subprogs cannot be called with preemption disabled. */
-		if (env->cur_state->active_preempt_lock) {
+		if (cur_func(env)->active_preempt_locks) {
 			verbose(env, "global function calls are not allowed with preemption disabled,\n"
 				     "use static function instead\n");
 			return -EINVAL;
@@ -10544,12 +10545,12 @@ static int check_resource_leak(struct bpf_verifier_env *env, bool exception_exit
 		return err;
 	}
 
-	if (check_lock && env->cur_state->active_rcu_lock) {
+	if (check_lock && cur_func(env)->active_rcu_lock) {
 		verbose(env, "%s cannot be used inside bpf_rcu_read_lock-ed region\n", prefix);
 		return -EINVAL;
 	}
 
-	if (check_lock && env->cur_state->active_preempt_lock) {
+	if (check_lock && cur_func(env)->active_preempt_locks) {
 		verbose(env, "%s cannot be used inside bpf_preempt_disable-ed region\n", prefix);
 		return -EINVAL;
 	}
@@ -10726,7 +10727,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		return err;
 	}
 
-	if (env->cur_state->active_rcu_lock) {
+	if (cur_func(env)->active_rcu_lock) {
 		if (fn->might_sleep) {
 			verbose(env, "sleepable helper %s#%d in rcu_read_lock region\n",
 				func_id_name(func_id), func_id);
@@ -10737,7 +10738,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			env->insn_aux_data[insn_idx].storage_get_func_atomic = true;
 	}
 
-	if (env->cur_state->active_preempt_lock) {
+	if (cur_func(env)->active_preempt_locks) {
 		if (fn->might_sleep) {
 			verbose(env, "sleepable helper %s#%d in non-preemptible region\n",
 				func_id_name(func_id), func_id);
@@ -12767,7 +12768,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	preempt_disable = is_kfunc_bpf_preempt_disable(&meta);
 	preempt_enable = is_kfunc_bpf_preempt_enable(&meta);
 
-	if (env->cur_state->active_rcu_lock) {
+	if (cur_func(env)->active_rcu_lock) {
 		struct bpf_func_state *state;
 		struct bpf_reg_state *reg;
 		u32 clear_mask = (1 << STACK_SPILL) | (1 << STACK_ITER);
@@ -12787,29 +12788,29 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 					reg->type |= PTR_UNTRUSTED;
 				}
 			}));
-			env->cur_state->active_rcu_lock = false;
+			cur_func(env)->active_rcu_lock = false;
 		} else if (sleepable) {
 			verbose(env, "kernel func %s is sleepable within rcu_read_lock region\n", func_name);
 			return -EACCES;
 		}
 	} else if (rcu_lock) {
-		env->cur_state->active_rcu_lock = true;
+		cur_func(env)->active_rcu_lock = true;
 	} else if (rcu_unlock) {
 		verbose(env, "unmatched rcu read unlock (kernel function %s)\n", func_name);
 		return -EINVAL;
 	}
 
-	if (env->cur_state->active_preempt_lock) {
+	if (cur_func(env)->active_preempt_locks) {
 		if (preempt_disable) {
-			env->cur_state->active_preempt_lock++;
+			cur_func(env)->active_preempt_locks++;
 		} else if (preempt_enable) {
-			env->cur_state->active_preempt_lock--;
+			cur_func(env)->active_preempt_locks--;
 		} else if (sleepable) {
 			verbose(env, "kernel func %s is sleepable within non-preemptible region\n", func_name);
 			return -EACCES;
 		}
 	} else if (preempt_disable) {
-		env->cur_state->active_preempt_lock++;
+		cur_func(env)->active_preempt_locks++;
 	} else if (preempt_enable) {
 		verbose(env, "unmatched attempt to enable preemption (kernel function %s)\n", func_name);
 		return -EINVAL;
@@ -17768,6 +17769,15 @@ static bool ressafe(struct bpf_func_state *old, struct bpf_func_state *cur,
 	if (old->acquired_res != cur->acquired_res)
 		return false;
 
+	if (old->active_locks != cur->active_locks)
+		return false;
+
+	if (old->active_preempt_locks != cur->active_preempt_locks)
+		return false;
+
+	if (old->active_rcu_lock != cur->active_rcu_lock)
+		return false;
+
 	for (i = 0; i < old->acquired_res; i++) {
 		if (!check_ids(old->res[i].id, cur->res[i].id, idmap) ||
 		    old->res[i].type != cur->res[i].type)
@@ -17860,12 +17870,6 @@ static bool states_equal(struct bpf_verifier_env *env,
 	if (old->speculative && !cur->speculative)
 		return false;
 
-	if (old->active_rcu_lock != cur->active_rcu_lock)
-		return false;
-
-	if (old->active_preempt_lock != cur->active_preempt_lock)
-		return false;
-
 	if (old->in_sleepable != cur->in_sleepable)
 		return false;
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 3/7] bpf: Consolidate RCU and preempt locks in bpf_func_state
  2024-11-21  0:53 ` [PATCH bpf-next v1 3/7] bpf: Consolidate RCU and preempt locks in bpf_func_state Kumar Kartikeya Dwivedi
@ 2024-11-21 18:09   ` Eduard Zingerman
  2024-11-21 18:12     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Eduard Zingerman @ 2024-11-21 18:09 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> To ensure consistency in resource handling, move RCU and preemption
> state counters to bpf_func_state, and convert all users to access them
> through cur_func(env).
> 
> For the sake of consistency, also compare active_locks in ressafe as a
> quick way to eliminate iteration and entry matching if the number of
> locks are not the same.
> 
> OTOH, the comparison of active_preempt_locks and active_rcu_lock is
> needed for correctness, as state exploration cannot be avoided if these
> counters do not match, and not comparing them will lead to problems
> since they lack an actual entry in the acquired_res array.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

This change is a bit confusing to me.
The following is done currently:
- in setup_func_entry() called from check_func_call():
  copy_resource_state(callee, caller);
- in prepare_func_exit():
  copy_resource_state(caller, callee);

So it seems that it is logical to track resources in the
bpf_verifier_state and avoid copying.
There is probably something I don't understand.

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 3/7] bpf: Consolidate RCU and preempt locks in bpf_func_state
  2024-11-21 18:09   ` Eduard Zingerman
@ 2024-11-21 18:12     ` Kumar Kartikeya Dwivedi
  2024-11-21 18:54       ` Eduard Zingerman
  0 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-21 18:12 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 21 Nov 2024 at 19:09, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> > To ensure consistency in resource handling, move RCU and preemption
> > state counters to bpf_func_state, and convert all users to access them
> > through cur_func(env).
> >
> > For the sake of consistency, also compare active_locks in ressafe as a
> > quick way to eliminate iteration and entry matching if the number of
> > locks are not the same.
> >
> > OTOH, the comparison of active_preempt_locks and active_rcu_lock is
> > needed for correctness, as state exploration cannot be avoided if these
> > counters do not match, and not comparing them will lead to problems
> > since they lack an actual entry in the acquired_res array.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> This change is a bit confusing to me.
> The following is done currently:
> - in setup_func_entry() called from check_func_call():
>   copy_resource_state(callee, caller);
> - in prepare_func_exit():
>   copy_resource_state(caller, callee);
>
> So it seems that it is logical to track resources in the
> bpf_verifier_state and avoid copying.
> There is probably something I don't understand.
>

This is what we were doing all along, and you're right, it is sort of
a global entity.
But we've moved active_locks to bpf_func_state, where references reside, while
RCU and preempt lock state stays in verifier state. Either everything
should be in
cur_func, or in bpf_verifier_state. I am fine with either of them,
because it would
materially does not matter too much.

Alexei's preference has been stashing this in bpf_func_state instead in [0].
Let me know what you think.

  [0] https://lore.kernel.org/bpf/CAADnVQKxgE7=WhjNckvMDTZ5GZujPuT3Dqd+sY=pW8CWoaF9FA@mail.gmail.com

> [...]
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 3/7] bpf: Consolidate RCU and preempt locks in bpf_func_state
  2024-11-21 18:12     ` Kumar Kartikeya Dwivedi
@ 2024-11-21 18:54       ` Eduard Zingerman
  2024-11-21 22:04         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Eduard Zingerman @ 2024-11-21 18:54 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 2024-11-21 at 19:12 +0100, Kumar Kartikeya Dwivedi wrote:
> On Thu, 21 Nov 2024 at 19:09, Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> > > To ensure consistency in resource handling, move RCU and preemption
> > > state counters to bpf_func_state, and convert all users to access them
> > > through cur_func(env).
> > > 
> > > For the sake of consistency, also compare active_locks in ressafe as a
> > > quick way to eliminate iteration and entry matching if the number of
> > > locks are not the same.
> > > 
> > > OTOH, the comparison of active_preempt_locks and active_rcu_lock is
> > > needed for correctness, as state exploration cannot be avoided if these
> > > counters do not match, and not comparing them will lead to problems
> > > since they lack an actual entry in the acquired_res array.
> > > 
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > 
> > This change is a bit confusing to me.
> > The following is done currently:
> > - in setup_func_entry() called from check_func_call():
> >   copy_resource_state(callee, caller);
> > - in prepare_func_exit():
> >   copy_resource_state(caller, callee);
> > 
> > So it seems that it is logical to track resources in the
> > bpf_verifier_state and avoid copying.
> > There is probably something I don't understand.
> > 
> 
> This is what we were doing all along, and you're right, it is sort of
> a global entity.

Right, but since this patch-set does a refactoring,
might be a good time to change.

> But we've moved active_locks to bpf_func_state, where references reside, while
> RCU and preempt lock state stays in verifier state. Either everything
> should be in
> cur_func, or in bpf_verifier_state. I am fine with either of them,
> because it would
> materially does not matter too much.
> 
> Alexei's preference has been stashing this in bpf_func_state instead in [0].
> Let me know what you think.
> 
>   [0] https://lore.kernel.org/bpf/CAADnVQKxgE7=WhjNckvMDTZ5GZujPuT3Dqd+sY=pW8CWoaF9FA@mail.gmail.com

As far as I understand check_func_call(), function calls to static
functions are allowed while holding each kind of resources currently
tracked. So it seems odd to track it as a part of function state.
The way I understand Alexei in the thread [0] the idea is more
to track all counters in one place.

Let's wait what Alexei has to say.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 3/7] bpf: Consolidate RCU and preempt locks in bpf_func_state
  2024-11-21 18:54       ` Eduard Zingerman
@ 2024-11-21 22:04         ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-21 22:04 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 21 Nov 2024 at 19:54, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-11-21 at 19:12 +0100, Kumar Kartikeya Dwivedi wrote:
> > On Thu, 21 Nov 2024 at 19:09, Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> > > > To ensure consistency in resource handling, move RCU and preemption
> > > > state counters to bpf_func_state, and convert all users to access them
> > > > through cur_func(env).
> > > >
> > > > For the sake of consistency, also compare active_locks in ressafe as a
> > > > quick way to eliminate iteration and entry matching if the number of
> > > > locks are not the same.
> > > >
> > > > OTOH, the comparison of active_preempt_locks and active_rcu_lock is
> > > > needed for correctness, as state exploration cannot be avoided if these
> > > > counters do not match, and not comparing them will lead to problems
> > > > since they lack an actual entry in the acquired_res array.
> > > >
> > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > ---
> > >
> > > This change is a bit confusing to me.
> > > The following is done currently:
> > > - in setup_func_entry() called from check_func_call():
> > >   copy_resource_state(callee, caller);
> > > - in prepare_func_exit():
> > >   copy_resource_state(caller, callee);
> > >
> > > So it seems that it is logical to track resources in the
> > > bpf_verifier_state and avoid copying.
> > > There is probably something I don't understand.
> > >
> >
> > This is what we were doing all along, and you're right, it is sort of
> > a global entity.
>
> Right, but since this patch-set does a refactoring,
> might be a good time to change.
>
> > But we've moved active_locks to bpf_func_state, where references reside, while
> > RCU and preempt lock state stays in verifier state. Either everything
> > should be in
> > cur_func, or in bpf_verifier_state. I am fine with either of them,
> > because it would
> > materially does not matter too much.
> >
> > Alexei's preference has been stashing this in bpf_func_state instead in [0].
> > Let me know what you think.
> >
> >   [0] https://lore.kernel.org/bpf/CAADnVQKxgE7=WhjNckvMDTZ5GZujPuT3Dqd+sY=pW8CWoaF9FA@mail.gmail.com
>
> As far as I understand check_func_call(), function calls to static
> functions are allowed while holding each kind of resources currently
> tracked. So it seems odd to track it as a part of function state.
> The way I understand Alexei in the thread [0] the idea is more
> to track all counters in one place.
>
> Let's wait what Alexei has to say.
>

Discussed with Alexei (who discussed with you I presume) that we're
doing this in bpf_verifier_state, will fix.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v1 4/7] bpf: Refactor mark_{dynptr,iter}_read
  2024-11-21  0:53 [PATCH bpf-next v1 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2024-11-21  0:53 ` [PATCH bpf-next v1 3/7] bpf: Consolidate RCU and preempt locks in bpf_func_state Kumar Kartikeya Dwivedi
@ 2024-11-21  0:53 ` Kumar Kartikeya Dwivedi
  2024-11-21 18:00   ` Eduard Zingerman
  2024-11-21  0:53 ` [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore} Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-21  0:53 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kernel-team

There is possibility of sharing code between mark_dynptr_read and
mark_iter_read for updating liveness information of their stack slots.
Consolidate common logic into mark_stack_slot_obj_read function in
preparation for the next patch which needs the same logic for its own
stack slots.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 43 +++++++++++++++++++++----------------------
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 25c44b68f16a..6cd2bbed4583 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3213,10 +3213,27 @@ static int mark_reg_read(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static int mark_dynptr_read(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+static int mark_stack_slot_obj_read(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				    int spi, int nr_slots)
 {
 	struct bpf_func_state *state = func(env, reg);
-	int spi, ret;
+	int err, i;
+
+	for (i = 0; i < nr_slots; i++) {
+		struct bpf_reg_state *st = &state->stack[spi - i].spilled_ptr;
+
+		err = mark_reg_read(env, st, st->parent, REG_LIVE_READ64);
+		if (err)
+			return err;
+
+		mark_stack_slot_scratched(env, spi - i);
+	}
+	return 0;
+}
+
+static int mark_dynptr_read(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	int spi;
 
 	/* For CONST_PTR_TO_DYNPTR, it must have already been done by
 	 * check_reg_arg in check_helper_call and mark_btf_func_reg_size in
@@ -3231,31 +3248,13 @@ static int mark_dynptr_read(struct bpf_verifier_env *env, struct bpf_reg_state *
 	 * bounds and spi is the first dynptr slot. Simply mark stack slot as
 	 * read.
 	 */
-	ret = mark_reg_read(env, &state->stack[spi].spilled_ptr,
-			    state->stack[spi].spilled_ptr.parent, REG_LIVE_READ64);
-	if (ret)
-		return ret;
-	return mark_reg_read(env, &state->stack[spi - 1].spilled_ptr,
-			     state->stack[spi - 1].spilled_ptr.parent, REG_LIVE_READ64);
+	return mark_stack_slot_obj_read(env, reg, spi, BPF_DYNPTR_NR_SLOTS);
 }
 
 static int mark_iter_read(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
 			  int spi, int nr_slots)
 {
-	struct bpf_func_state *state = func(env, reg);
-	int err, i;
-
-	for (i = 0; i < nr_slots; i++) {
-		struct bpf_reg_state *st = &state->stack[spi - i].spilled_ptr;
-
-		err = mark_reg_read(env, st, st->parent, REG_LIVE_READ64);
-		if (err)
-			return err;
-
-		mark_stack_slot_scratched(env, spi - i);
-	}
-
-	return 0;
+	return mark_stack_slot_obj_read(env, reg, spi, nr_slots);
 }
 
 /* This function is supposed to be used by the following 32-bit optimization
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 4/7] bpf: Refactor mark_{dynptr,iter}_read
  2024-11-21  0:53 ` [PATCH bpf-next v1 4/7] bpf: Refactor mark_{dynptr,iter}_read Kumar Kartikeya Dwivedi
@ 2024-11-21 18:00   ` Eduard Zingerman
  0 siblings, 0 replies; 30+ messages in thread
From: Eduard Zingerman @ 2024-11-21 18:00 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> There is possibility of sharing code between mark_dynptr_read and
> mark_iter_read for updating liveness information of their stack slots.
> Consolidate common logic into mark_stack_slot_obj_read function in
> preparation for the next patch which needs the same logic for its own
> stack slots.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore}
  2024-11-21  0:53 [PATCH bpf-next v1 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
                   ` (3 preceding siblings ...)
  2024-11-21  0:53 ` [PATCH bpf-next v1 4/7] bpf: Refactor mark_{dynptr,iter}_read Kumar Kartikeya Dwivedi
@ 2024-11-21  0:53 ` Kumar Kartikeya Dwivedi
  2024-11-21 20:21   ` Eduard Zingerman
  2024-11-21 22:46   ` kernel test robot
  2024-11-21  0:53 ` [PATCH bpf-next v1 6/7] selftests/bpf: Expand coverage of preempt tests to sleepable kfunc Kumar Kartikeya Dwivedi
  2024-11-21  0:53 ` [PATCH bpf-next v1 7/7] selftests/bpf: Add IRQ save/restore tests Kumar Kartikeya Dwivedi
  6 siblings, 2 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-21  0:53 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kernel-team

Teach the verifier about IRQ-disabled sections through the introduction
of two new kfuncs, bpf_local_irq_save, to save IRQ state and disable
them, and bpf_local_irq_restore, to restore IRQ state and enable them
back again.

For the purposes of tracking the saved IRQ state, the verifier is taught
about a new special object on the stack of type STACK_IRQ_FLAG. This is
a 8 byte value which saves the IRQ flags which are to be passed back to
the IRQ restore kfunc.

To track a dynamic number of IRQ-disabled regions and their associated
saved states, a new resource type RES_TYPE_IRQ is introduced, which its
state management functions: acquire_irq_state and release_irq_state,
taking advantage of the refactoring and clean ups made in earlier
commits.

One notable requirement of the kernel's IRQ save and restore API is that
they cannot happen out of order. For this purpose, resource state is
extended with a new type-specific member 'prev_id'. This is used to
remember the ordering of acquisitions of IRQ saved states, so that we
maintain a logical stack in acquisition order of resource identities,
and can enforce LIFO ordering when restoring IRQ state. The top of the
stack is maintained using bpf_func_state's active_irq_id.

The logic to detect initialized and unitialized irq flag slots, marking
and unmarking is similar to how it's done for iterators. We do need to
update ressafe to perform check_ids based satisfiability check, and
additionally match prev_id for RES_TYPE_IRQ entries in the resource
array.

The kfuncs themselves are plain wrappers over local_irq_save and
local_irq_restore macros.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |  19 ++-
 kernel/bpf/helpers.c         |  24 +++
 kernel/bpf/log.c             |   1 +
 kernel/bpf/verifier.c        | 283 ++++++++++++++++++++++++++++++++++-
 4 files changed, 322 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index fa09538a35bc..f44961dccbac 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -233,6 +233,7 @@ enum bpf_stack_slot_type {
 	 */
 	STACK_DYNPTR,
 	STACK_ITER,
+	STACK_IRQ_FLAG,
 };
 
 #define BPF_REG_SIZE 8	/* size of eBPF register in bytes */
@@ -253,6 +254,9 @@ struct bpf_resource_state {
 	enum res_state_type {
 		RES_TYPE_INV = -1,
 		RES_TYPE_PTR = 0,
+		RES_TYPE_IRQ,
+
+		__RES_TYPE_LOCK_BEGIN,
 		RES_TYPE_LOCK,
 	} type;
 	/* Track each resource created with a unique id, even if the same
@@ -263,10 +267,16 @@ struct bpf_resource_state {
 	 * is used purely to inform the user of a resource leak.
 	 */
 	int insn_idx;
-	/* Use to keep track of the source object of a lock, to ensure
-	 * it matches on unlock.
-	 */
-	void *ptr;
+	union {
+		/* Use to keep track of the source object of a lock, to ensure
+		 * it matches on unlock.
+		 */
+		void *ptr;
+		/* Track the reference id preceding the IRQ entry in acquisition
+		 * order, to enforce an ordering on the release.
+		 */
+		int prev_id;
+	};
 };
 
 struct bpf_retval_range {
@@ -317,6 +327,7 @@ struct bpf_func_state {
 	int active_locks;
 	int active_preempt_locks;
 	bool active_rcu_lock;
+	int active_irq_id;
 	struct bpf_resource_state *res;
 	/* The state of the stack. Each element of the array describes BPF_REG_SIZE
 	 * (i.e. 8) bytes worth of stack memory.
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 751c150f9e1c..302f0d5976be 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -3057,6 +3057,28 @@ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user
 	return ret + 1;
 }
 
+/* Keep unsinged long in prototype so that kfunc is usable when emitted to
+ * vmlinux.h in BPF programs directly, but since unsigned long may potentially
+ * be 4 byte, always cast to u64 when reading/writing from this pointer as it
+ * always points to an 8-byte memory region in BPF stack.
+ */
+__bpf_kfunc void bpf_local_irq_save(unsigned long *flags__irq_flag)
+{
+	u64 *ptr = (u64 *)flags__irq_flag;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	*ptr = flags;
+}
+
+__bpf_kfunc void bpf_local_irq_restore(unsigned long *flags__irq_flag)
+{
+	u64 *ptr = (u64 *)flags__irq_flag;
+	unsigned long flags = *ptr;
+
+	local_irq_restore(flags);
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(generic_btf_ids)
@@ -3149,6 +3171,8 @@ BTF_ID_FLAGS(func, bpf_get_kmem_cache)
 BTF_ID_FLAGS(func, bpf_iter_kmem_cache_new, KF_ITER_NEW | KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_iter_kmem_cache_next, KF_ITER_NEXT | KF_RET_NULL | KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_iter_kmem_cache_destroy, KF_ITER_DESTROY | KF_SLEEPABLE)
+BTF_ID_FLAGS(func, bpf_local_irq_save)
+BTF_ID_FLAGS(func, bpf_local_irq_restore)
 BTF_KFUNCS_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index 0ad6f0737c57..fc5520782e5d 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -537,6 +537,7 @@ static char slot_type_char[] = {
 	[STACK_ZERO]	= '0',
 	[STACK_DYNPTR]	= 'd',
 	[STACK_ITER]	= 'i',
+	[STACK_IRQ_FLAG] = 'f'
 };
 
 static void print_liveness(struct bpf_verifier_env *env,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6cd2bbed4583..67ffcbb963bd 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -660,6 +660,11 @@ static int iter_get_spi(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
 	return stack_slot_obj_get_spi(env, reg, "iter", nr_slots);
 }
 
+static int irq_flag_get_spi(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	return stack_slot_obj_get_spi(env, reg, "irq_flag", 1);
+}
+
 static enum bpf_dynptr_type arg_to_dynptr_type(enum bpf_arg_type arg_type)
 {
 	switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
@@ -1155,10 +1160,126 @@ static int is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_s
 	return 0;
 }
 
+static int acquire_irq_state(struct bpf_verifier_env *env, int insn_idx);
+static int release_irq_state(struct bpf_func_state *state, int id);
+
+static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
+				     struct bpf_kfunc_call_arg_meta *meta,
+				     struct bpf_reg_state *reg, int insn_idx)
+{
+	struct bpf_func_state *state = func(env, reg);
+	struct bpf_stack_state *slot;
+	struct bpf_reg_state *st;
+	int spi, i, id;
+
+	spi = irq_flag_get_spi(env, reg);
+	if (spi < 0)
+		return spi;
+
+	id = acquire_irq_state(env, insn_idx);
+	if (id < 0)
+		return id;
+
+	slot = &state->stack[spi];
+	st = &slot->spilled_ptr;
+
+	__mark_reg_known_zero(st);
+	st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
+	st->live |= REG_LIVE_WRITTEN;
+	st->ref_obj_id = id;
+
+	for (i = 0; i < BPF_REG_SIZE; i++)
+		slot->slot_type[i] = STACK_IRQ_FLAG;
+
+	mark_stack_slot_scratched(env, spi);
+	return 0;
+}
+
+static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	struct bpf_stack_state *slot;
+	struct bpf_reg_state *st;
+	int spi, i, err;
+
+	spi = irq_flag_get_spi(env, reg);
+	if (spi < 0)
+		return spi;
+
+	slot = &state->stack[spi];
+	st = &slot->spilled_ptr;
+
+	err = release_irq_state(cur_func(env), st->ref_obj_id);
+	WARN_ON_ONCE(err && err != -EPROTO);
+	if (err) {
+		verbose(env, "cannot restore irq state out of order\n");
+		return err;
+	}
+
+	__mark_reg_not_init(env, st);
+
+	/* see unmark_stack_slots_dynptr() for why we need to set REG_LIVE_WRITTEN */
+	st->live |= REG_LIVE_WRITTEN;
+
+	for (i = 0; i < BPF_REG_SIZE; i++)
+		slot->slot_type[i] = STACK_INVALID;
+
+	mark_stack_slot_scratched(env, spi - i);
+	return 0;
+}
+
+static bool is_irq_flag_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	struct bpf_stack_state *slot;
+	int spi, i;
+
+	/* For -ERANGE (i.e. spi not falling into allocated stack slots), we
+	 * will do check_mem_access to check and update stack bounds later, so
+	 * return true for that case.
+	 */
+	spi = irq_flag_get_spi(env, reg);
+	if (spi == -ERANGE)
+		return true;
+	if (spi < 0)
+		return false;
+
+	slot = &state->stack[spi];
+
+	for (i = 0; i < BPF_REG_SIZE; i++)
+		if (slot->slot_type[i] == STACK_IRQ_FLAG)
+			return false;
+	return true;
+}
+
+static int is_irq_flag_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	struct bpf_stack_state *slot;
+	struct bpf_reg_state *st;
+	int spi, i;
+
+	spi = irq_flag_get_spi(env, reg);
+	if (spi < 0)
+		return -EINVAL;
+
+	slot = &state->stack[spi];
+	st = &slot->spilled_ptr;
+
+	if (!st->ref_obj_id)
+		return -EINVAL;
+
+	for (i = 0; i < BPF_REG_SIZE; i++)
+		if (slot->slot_type[i] != STACK_IRQ_FLAG)
+			return -EINVAL;
+	return 0;
+}
+
 /* Check if given stack slot is "special":
  *   - spilled register state (STACK_SPILL);
  *   - dynptr state (STACK_DYNPTR);
  *   - iter state (STACK_ITER).
+ *   - irq flag state (STACK_IRQ_FLAG)
  */
 static bool is_stack_slot_special(const struct bpf_stack_state *stack)
 {
@@ -1168,6 +1289,7 @@ static bool is_stack_slot_special(const struct bpf_stack_state *stack)
 	case STACK_SPILL:
 	case STACK_DYNPTR:
 	case STACK_ITER:
+	case STACK_IRQ_FLAG:
 		return true;
 	case STACK_INVALID:
 	case STACK_MISC:
@@ -1291,6 +1413,7 @@ static int copy_resource_state(struct bpf_func_state *dst, const struct bpf_func
 	dst->active_locks = src->active_locks;
 	dst->active_preempt_locks = src->active_preempt_locks;
 	dst->active_rcu_lock = src->active_rcu_lock;
+	dst->active_irq_id = src->active_irq_id;
 	return 0;
 }
 
@@ -1398,6 +1521,22 @@ static int acquire_lock_state(struct bpf_verifier_env *env, int insn_idx, enum r
 	return 0;
 }
 
+static int acquire_irq_state(struct bpf_verifier_env *env, int insn_idx)
+{
+	struct bpf_func_state *state = cur_func(env);
+	struct bpf_resource_state *s;
+	int id;
+
+	s = acquire_resource_state(env, insn_idx, &id);
+	if (!s)
+		return -ENOMEM;
+	s->type = RES_TYPE_IRQ;
+	s->prev_id = state->active_irq_id;
+
+	state->active_irq_id = id;
+	return id;
+}
+
 static void erase_resource_state(struct bpf_func_state *state, int res_idx)
 {
 	int last_idx = state->acquired_res - 1;
@@ -1439,6 +1578,27 @@ static int release_lock_state(struct bpf_func_state *state, int type, int id, vo
 	return -EINVAL;
 }
 
+static int release_irq_state(struct bpf_func_state *state, int id)
+{
+	int i;
+
+	if (id != state->active_irq_id)
+		return -EPROTO;
+
+	for (i = 0; i < state->acquired_res; i++) {
+		if (state->res[i].type != RES_TYPE_IRQ)
+			continue;
+		if (state->res[i].id == id) {
+			int prev_id = state->res[i].prev_id;
+
+			erase_resource_state(state, i);
+			state->active_irq_id = prev_id;
+			return 0;
+		}
+	}
+	return -EINVAL;
+}
+
 static struct bpf_resource_state *find_lock_state(struct bpf_func_state *state, enum res_state_type type,
 						   int id, void *ptr)
 {
@@ -1447,7 +1607,7 @@ static struct bpf_resource_state *find_lock_state(struct bpf_func_state *state,
 	for (i = 0; i < state->acquired_res; i++) {
 		struct bpf_resource_state *s = &state->res[i];
 
-		if (s->type == RES_TYPE_PTR || s->type != type)
+		if (s->type < __RES_TYPE_LOCK_BEGIN || s->type != type)
 			continue;
 
 		if (s->id == id && s->ptr == ptr)
@@ -3257,6 +3417,16 @@ static int mark_iter_read(struct bpf_verifier_env *env, struct bpf_reg_state *re
 	return mark_stack_slot_obj_read(env, reg, spi, nr_slots);
 }
 
+static int mark_irq_flag_read(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	int spi;
+
+	spi = irq_flag_get_spi(env, reg);
+	if (spi < 0)
+		return spi;
+	return mark_stack_slot_obj_read(env, reg, spi, 1);
+}
+
 /* This function is supposed to be used by the following 32-bit optimization
  * code only. It returns TRUE if the source or destination register operates
  * on 64-bit, otherwise return FALSE.
@@ -10015,6 +10185,12 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			return -EINVAL;
 		}
 
+		if (cur_func(env)->active_irq_id) {
+			verbose(env, "global function calls are not allowed with IRQs disabled,\n"
+				     "use static function instead\n");
+			return -EINVAL;
+		}
+
 		if (err) {
 			verbose(env, "Caller passes invalid args into func#%d ('%s')\n",
 				subprog, sub_name);
@@ -10544,6 +10720,11 @@ static int check_resource_leak(struct bpf_verifier_env *env, bool exception_exit
 		return err;
 	}
 
+	if (check_lock && cur_func(env)->active_irq_id) {
+		verbose(env, "%s cannot be used inside bpf_local_irq_save-ed region\n", prefix);
+		return -EINVAL;
+	}
+
 	if (check_lock && cur_func(env)->active_rcu_lock) {
 		verbose(env, "%s cannot be used inside bpf_rcu_read_lock-ed region\n", prefix);
 		return -EINVAL;
@@ -10748,6 +10929,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			env->insn_aux_data[insn_idx].storage_get_func_atomic = true;
 	}
 
+	if (cur_func(env)->active_irq_id) {
+		if (fn->might_sleep) {
+			verbose(env, "sleepable helper %s#%d in IRQ-disabled region\n",
+				func_id_name(func_id), func_id);
+			return -EINVAL;
+		}
+
+		if (in_sleepable(env) && is_storage_get_function(func_id))
+			env->insn_aux_data[insn_idx].storage_get_func_atomic = true;
+	}
+
 	meta.func_id = func_id;
 	/* check args */
 	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
@@ -11309,6 +11501,11 @@ static bool is_kfunc_arg_const_str(const struct btf *btf, const struct btf_param
 	return btf_param_match_suffix(btf, arg, "__str");
 }
 
+static bool is_kfunc_arg_irq_flag(const struct btf *btf, const struct btf_param *arg)
+{
+	return btf_param_match_suffix(btf, arg, "__irq_flag");
+}
+
 static bool is_kfunc_arg_scalar_with_name(const struct btf *btf,
 					  const struct btf_param *arg,
 					  const char *name)
@@ -11462,6 +11659,7 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_CONST_STR,
 	KF_ARG_PTR_TO_MAP,
 	KF_ARG_PTR_TO_WORKQUEUE,
+	KF_ARG_PTR_TO_IRQ_FLAG,
 };
 
 enum special_kfunc_type {
@@ -11493,6 +11691,8 @@ enum special_kfunc_type {
 	KF_bpf_iter_css_task_new,
 	KF_bpf_session_cookie,
 	KF_bpf_get_kmem_cache,
+	KF_bpf_local_irq_save,
+	KF_bpf_local_irq_restore,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -11559,6 +11759,8 @@ BTF_ID(func, bpf_session_cookie)
 BTF_ID_UNUSED
 #endif
 BTF_ID(func, bpf_get_kmem_cache)
+BTF_ID(func, bpf_local_irq_save)
+BTF_ID(func, bpf_local_irq_restore)
 
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -11649,6 +11851,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_wq(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_WORKQUEUE;
 
+	if (is_kfunc_arg_irq_flag(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_IRQ_FLAG;
+
 	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
 		if (!btf_type_is_struct(ref_t)) {
 			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -11752,6 +11957,54 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int process_irq_flag(struct bpf_verifier_env *env, int regno,
+			     struct bpf_kfunc_call_arg_meta *meta)
+{
+	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+	bool irq_save = false, irq_restore = false;
+	int err;
+
+	if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save]) {
+		irq_save = true;
+	} else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore]) {
+		irq_restore = true;
+	} else {
+		verbose(env, "verifier internal error: unknown irq flags kfunc\n");
+		return -EFAULT;
+	}
+
+	if (irq_save) {
+		if (!is_irq_flag_reg_valid_uninit(env, reg)) {
+			verbose(env, "expected uninitialized irq flag as arg#%d\n", regno);
+			return -EINVAL;
+		}
+
+		err = check_mem_access(env, env->insn_idx, regno, 0, BPF_DW, BPF_WRITE, -1, false, false);
+		if (err)
+			return err;
+
+		err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx);
+		if (err)
+			return err;
+	} else {
+		err = is_irq_flag_reg_valid_init(env, reg);
+		if (err) {
+			verbose(env, "expected an initialized irq flag as arg#%d\n", regno);
+			return err;
+		}
+
+		err = mark_irq_flag_read(env, reg);
+		if (err)
+			return err;
+
+		err = unmark_stack_slot_irq_flag(env, reg);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+
 static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
 {
 	struct btf_record *rec = reg_btf_record(reg);
@@ -12341,6 +12594,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_REFCOUNTED_KPTR:
 		case KF_ARG_PTR_TO_CONST_STR:
 		case KF_ARG_PTR_TO_WORKQUEUE:
+		case KF_ARG_PTR_TO_IRQ_FLAG:
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -12635,6 +12889,15 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (ret < 0)
 				return ret;
 			break;
+		case KF_ARG_PTR_TO_IRQ_FLAG:
+			if (reg->type != PTR_TO_STACK) {
+				verbose(env, "arg#%d doesn't point to an irq flag on stack\n", i);
+				return -EINVAL;
+			}
+			ret = process_irq_flag(env, regno, meta);
+			if (ret < 0)
+				return ret;
+			break;
 		}
 	}
 
@@ -12815,6 +13078,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		return -EINVAL;
 	}
 
+	if (cur_func(env)->active_irq_id && sleepable) {
+		verbose(env, "kernel func %s is sleepable within IRQ-disabled region\n", func_name);
+		return -EACCES;
+	}
+
 	/* In case of release function, we get register number of refcounted
 	 * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now.
 	 */
@@ -17748,6 +18016,12 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
 			    !check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap))
 				return false;
 			break;
+		case STACK_IRQ_FLAG:
+			old_reg = &old->stack[spi].spilled_ptr;
+			cur_reg = &cur->stack[spi].spilled_ptr;
+			if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap))
+				return false;
+			break;
 		case STACK_MISC:
 		case STACK_ZERO:
 		case STACK_INVALID:
@@ -17777,6 +18051,9 @@ static bool ressafe(struct bpf_func_state *old, struct bpf_func_state *cur,
 	if (old->active_rcu_lock != cur->active_rcu_lock)
 		return false;
 
+	if (!check_ids(old->active_irq_id, cur->active_irq_id, idmap))
+		return false;
+
 	for (i = 0; i < old->acquired_res; i++) {
 		if (!check_ids(old->res[i].id, cur->res[i].id, idmap) ||
 		    old->res[i].type != cur->res[i].type)
@@ -17784,6 +18061,10 @@ static bool ressafe(struct bpf_func_state *old, struct bpf_func_state *cur,
 		switch (old->res[i].type) {
 		case RES_TYPE_PTR:
 			break;
+		case RES_TYPE_IRQ:
+			if (!check_ids(old->res[i].prev_id, cur->res[i].prev_id, idmap))
+				return false;
+			break;
 		case RES_TYPE_LOCK:
 			if (old->res[i].ptr != cur->res[i].ptr)
 				return false;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore}
  2024-11-21  0:53 ` [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore} Kumar Kartikeya Dwivedi
@ 2024-11-21 20:21   ` Eduard Zingerman
  2024-11-21 22:06     ` Kumar Kartikeya Dwivedi
  2024-11-21 22:46   ` kernel test robot
  1 sibling, 1 reply; 30+ messages in thread
From: Eduard Zingerman @ 2024-11-21 20:21 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> Teach the verifier about IRQ-disabled sections through the introduction
> of two new kfuncs, bpf_local_irq_save, to save IRQ state and disable
> them, and bpf_local_irq_restore, to restore IRQ state and enable them
> back again.
> 
> For the purposes of tracking the saved IRQ state, the verifier is taught
> about a new special object on the stack of type STACK_IRQ_FLAG. This is
> a 8 byte value which saves the IRQ flags which are to be passed back to
> the IRQ restore kfunc.
> 
> To track a dynamic number of IRQ-disabled regions and their associated
> saved states, a new resource type RES_TYPE_IRQ is introduced, which its
> state management functions: acquire_irq_state and release_irq_state,
> taking advantage of the refactoring and clean ups made in earlier
> commits.
> 
> One notable requirement of the kernel's IRQ save and restore API is that
> they cannot happen out of order. For this purpose, resource state is
> extended with a new type-specific member 'prev_id'. This is used to
> remember the ordering of acquisitions of IRQ saved states, so that we
> maintain a logical stack in acquisition order of resource identities,
> and can enforce LIFO ordering when restoring IRQ state. The top of the
> stack is maintained using bpf_func_state's active_irq_id.
> 
> The logic to detect initialized and unitialized irq flag slots, marking
> and unmarking is similar to how it's done for iterators. We do need to
> update ressafe to perform check_ids based satisfiability check, and
> additionally match prev_id for RES_TYPE_IRQ entries in the resource
> array.
> 
> The kfuncs themselves are plain wrappers over local_irq_save and
> local_irq_restore macros.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

I think this matches what is done for iterators and dynptrs.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> @@ -263,10 +267,16 @@ struct bpf_resource_state {
>  	 * is used purely to inform the user of a resource leak.
>  	 */
>  	int insn_idx;
> -	/* Use to keep track of the source object of a lock, to ensure
> -	 * it matches on unlock.
> -	 */
> -	void *ptr;
> +	union {
> +		/* Use to keep track of the source object of a lock, to ensure
> +		 * it matches on unlock.
> +		 */
> +		void *ptr;
> +		/* Track the reference id preceding the IRQ entry in acquisition
> +		 * order, to enforce an ordering on the release.
> +		 */
> +		int prev_id;
> +	};

Nit:  Do we anticipate any other resource kinds that would need LIFO acquire/release?
      If we do, an alternative to prev_id would be to organize bpf_func_state->res as
      a stack (by changing erase_resource_state() implementation).

[...]

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 751c150f9e1c..302f0d5976be 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -3057,6 +3057,28 @@ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user
>  	return ret + 1;
>  }
>  
> +/* Keep unsinged long in prototype so that kfunc is usable when emitted to
> + * vmlinux.h in BPF programs directly, but since unsigned long may potentially
> + * be 4 byte, always cast to u64 when reading/writing from this pointer as it
> + * always points to an 8-byte memory region in BPF stack.
> + */
> +__bpf_kfunc void bpf_local_irq_save(unsigned long *flags__irq_flag)

Nit: 'unsigned long long' is guaranteed to be at-least 64 bit.
     What would go wrong if 'u64' is used here?

> +{
> +	u64 *ptr = (u64 *)flags__irq_flag;
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +	*ptr = flags;
> +}

[...]

> @@ -1447,7 +1607,7 @@ static struct bpf_resource_state *find_lock_state(struct bpf_func_state *state,
>  	for (i = 0; i < state->acquired_res; i++) {
>  		struct bpf_resource_state *s = &state->res[i];
>  
> -		if (s->type == RES_TYPE_PTR || s->type != type)
> +		if (s->type < __RES_TYPE_LOCK_BEGIN || s->type != type)

Nit: I think this would be easier to read if there was a bitmask
     associated with lock types.

>  			continue;
>  
>  		if (s->id == id && s->ptr == ptr)

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore}
  2024-11-21 20:21   ` Eduard Zingerman
@ 2024-11-21 22:06     ` Kumar Kartikeya Dwivedi
  2024-11-21 23:08       ` Eduard Zingerman
  2024-11-22  0:32       ` Alexei Starovoitov
  0 siblings, 2 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-21 22:06 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 21 Nov 2024 at 21:21, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> > Teach the verifier about IRQ-disabled sections through the introduction
> > of two new kfuncs, bpf_local_irq_save, to save IRQ state and disable
> > them, and bpf_local_irq_restore, to restore IRQ state and enable them
> > back again.
> >
> > For the purposes of tracking the saved IRQ state, the verifier is taught
> > about a new special object on the stack of type STACK_IRQ_FLAG. This is
> > a 8 byte value which saves the IRQ flags which are to be passed back to
> > the IRQ restore kfunc.
> >
> > To track a dynamic number of IRQ-disabled regions and their associated
> > saved states, a new resource type RES_TYPE_IRQ is introduced, which its
> > state management functions: acquire_irq_state and release_irq_state,
> > taking advantage of the refactoring and clean ups made in earlier
> > commits.
> >
> > One notable requirement of the kernel's IRQ save and restore API is that
> > they cannot happen out of order. For this purpose, resource state is
> > extended with a new type-specific member 'prev_id'. This is used to
> > remember the ordering of acquisitions of IRQ saved states, so that we
> > maintain a logical stack in acquisition order of resource identities,
> > and can enforce LIFO ordering when restoring IRQ state. The top of the
> > stack is maintained using bpf_func_state's active_irq_id.
> >
> > The logic to detect initialized and unitialized irq flag slots, marking
> > and unmarking is similar to how it's done for iterators. We do need to
> > update ressafe to perform check_ids based satisfiability check, and
> > additionally match prev_id for RES_TYPE_IRQ entries in the resource
> > array.
> >
> > The kfuncs themselves are plain wrappers over local_irq_save and
> > local_irq_restore macros.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> I think this matches what is done for iterators and dynptrs.
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
> > @@ -263,10 +267,16 @@ struct bpf_resource_state {
> >        * is used purely to inform the user of a resource leak.
> >        */
> >       int insn_idx;
> > -     /* Use to keep track of the source object of a lock, to ensure
> > -      * it matches on unlock.
> > -      */
> > -     void *ptr;
> > +     union {
> > +             /* Use to keep track of the source object of a lock, to ensure
> > +              * it matches on unlock.
> > +              */
> > +             void *ptr;
> > +             /* Track the reference id preceding the IRQ entry in acquisition
> > +              * order, to enforce an ordering on the release.
> > +              */
> > +             int prev_id;
> > +     };
>
> Nit:  Do we anticipate any other resource kinds that would need LIFO acquire/release?
>       If we do, an alternative to prev_id would be to organize bpf_func_state->res as
>       a stack (by changing erase_resource_state() implementation).

I don't think so, this was the weird case requiring such an ordering,
so I tried to find the least intrusive way.

>
> [...]
>
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 751c150f9e1c..302f0d5976be 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -3057,6 +3057,28 @@ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user
> >       return ret + 1;
> >  }
> >
> > +/* Keep unsinged long in prototype so that kfunc is usable when emitted to
> > + * vmlinux.h in BPF programs directly, but since unsigned long may potentially
> > + * be 4 byte, always cast to u64 when reading/writing from this pointer as it
> > + * always points to an 8-byte memory region in BPF stack.
> > + */
> > +__bpf_kfunc void bpf_local_irq_save(unsigned long *flags__irq_flag)
>
> Nit: 'unsigned long long' is guaranteed to be at-least 64 bit.
>      What would go wrong if 'u64' is used here?

It goes like this:
If I make this unsigned long long * or u64 *, the kfunc emitted to
vmlinux.h expects a pointer of that type.
Typically, kernel code is always passing unsigned long flags to these
functions, and that's what people are used to.
Given for --target=bpf unsigned long * is always a 8-byte value, I
just did this, so that in kernels that are 32-bit,
we don't end up relying on unsigned long still being 8 when
fetching/storing flags on BPF stack.

>
> > +{
> > +     u64 *ptr = (u64 *)flags__irq_flag;
> > +     unsigned long flags;
> > +
> > +     local_irq_save(flags);
> > +     *ptr = flags;
> > +}
>
> [...]
>
> > @@ -1447,7 +1607,7 @@ static struct bpf_resource_state *find_lock_state(struct bpf_func_state *state,
> >       for (i = 0; i < state->acquired_res; i++) {
> >               struct bpf_resource_state *s = &state->res[i];
> >
> > -             if (s->type == RES_TYPE_PTR || s->type != type)
> > +             if (s->type < __RES_TYPE_LOCK_BEGIN || s->type != type)
>
> Nit: I think this would be easier to read if there was a bitmask
>      associated with lock types.

Ack, will fix.

>
> >                       continue;
> >
> >               if (s->id == id && s->ptr == ptr)
>
> [...]
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore}
  2024-11-21 22:06     ` Kumar Kartikeya Dwivedi
@ 2024-11-21 23:08       ` Eduard Zingerman
  2024-11-21 23:12         ` Kumar Kartikeya Dwivedi
  2024-11-22  0:32       ` Alexei Starovoitov
  1 sibling, 1 reply; 30+ messages in thread
From: Eduard Zingerman @ 2024-11-21 23:08 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 2024-11-21 at 23:06 +0100, Kumar Kartikeya Dwivedi wrote:

[...]

> > > +/* Keep unsinged long in prototype so that kfunc is usable when emitted to
> > > + * vmlinux.h in BPF programs directly, but since unsigned long may potentially
> > > + * be 4 byte, always cast to u64 when reading/writing from this pointer as it
> > > + * always points to an 8-byte memory region in BPF stack.
> > > + */
> > > +__bpf_kfunc void bpf_local_irq_save(unsigned long *flags__irq_flag)
> > 
> > Nit: 'unsigned long long' is guaranteed to be at-least 64 bit.
> >      What would go wrong if 'u64' is used here?
> 
> It goes like this:
> If I make this unsigned long long * or u64 *, the kfunc emitted to
> vmlinux.h expects a pointer of that type.
> Typically, kernel code is always passing unsigned long flags to these
> functions, and that's what people are used to.
> Given for --target=bpf unsigned long * is always a 8-byte value, I
> just did this, so that in kernels that are 32-bit,
> we don't end up relying on unsigned long still being 8 when
> fetching/storing flags on BPF stack.

So, the goal is to enable the following pattern:

  unsigned long flags;
  bpf_local_irq_save(&flags);

Right?

For a 32-bit system 'flags' would be 4 bytes long.
Consider the following example:

  unsigned long flags; // assume 'flags' and 'foo'
  int foo;             // are allocated sequentially.
  
  bpf_local_irq_save(&flags);

I think that in such case '*ptr = flags;' would overwrite foo.

[...]

> > > +{
> > > +     u64 *ptr = (u64 *)flags__irq_flag;
> > > +     unsigned long flags;
> > > +
> > > +     local_irq_save(flags);
> > > +     *ptr = flags;
> > > +}


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore}
  2024-11-21 23:08       ` Eduard Zingerman
@ 2024-11-21 23:12         ` Kumar Kartikeya Dwivedi
  2024-11-22  0:30           ` Eduard Zingerman
  0 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-21 23:12 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Fri, 22 Nov 2024 at 00:08, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-11-21 at 23:06 +0100, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > > > +/* Keep unsinged long in prototype so that kfunc is usable when emitted to
> > > > + * vmlinux.h in BPF programs directly, but since unsigned long may potentially
> > > > + * be 4 byte, always cast to u64 when reading/writing from this pointer as it
> > > > + * always points to an 8-byte memory region in BPF stack.
> > > > + */
> > > > +__bpf_kfunc void bpf_local_irq_save(unsigned long *flags__irq_flag)
> > >
> > > Nit: 'unsigned long long' is guaranteed to be at-least 64 bit.
> > >      What would go wrong if 'u64' is used here?
> >
> > It goes like this:
> > If I make this unsigned long long * or u64 *, the kfunc emitted to
> > vmlinux.h expects a pointer of that type.
> > Typically, kernel code is always passing unsigned long flags to these
> > functions, and that's what people are used to.
> > Given for --target=bpf unsigned long * is always a 8-byte value, I
> > just did this, so that in kernels that are 32-bit,
> > we don't end up relying on unsigned long still being 8 when
> > fetching/storing flags on BPF stack.
>
> So, the goal is to enable the following pattern:
>
>   unsigned long flags;
>   bpf_local_irq_save(&flags);
>
> Right?
>
> For a 32-bit system 'flags' would be 4 bytes long.
> Consider the following example:
>
>   unsigned long flags; // assume 'flags' and 'foo'
>   int foo;             // are allocated sequentially.
>
>   bpf_local_irq_save(&flags);
>
> I think that in such case '*ptr = flags;' would overwrite foo.

In the kernel or userspace, yes, but I'm assuming unsigned long will
always be 64-bit for target=BPF.
Would that be incorrect? This pattern will only happen within BPF programs.

>
> [...]
>
> > > > +{
> > > > +     u64 *ptr = (u64 *)flags__irq_flag;
> > > > +     unsigned long flags;
> > > > +
> > > > +     local_irq_save(flags);
> > > > +     *ptr = flags;
> > > > +}
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore}
  2024-11-21 23:12         ` Kumar Kartikeya Dwivedi
@ 2024-11-22  0:30           ` Eduard Zingerman
  0 siblings, 0 replies; 30+ messages in thread
From: Eduard Zingerman @ 2024-11-22  0:30 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Fri, 2024-11-22 at 00:12 +0100, Kumar Kartikeya Dwivedi wrote:
> On Fri, 22 Nov 2024 at 00:08, Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Thu, 2024-11-21 at 23:06 +0100, Kumar Kartikeya Dwivedi wrote:
> > 
> > [...]
> > 
> > > > > +/* Keep unsinged long in prototype so that kfunc is usable when emitted to
> > > > > + * vmlinux.h in BPF programs directly, but since unsigned long may potentially
> > > > > + * be 4 byte, always cast to u64 when reading/writing from this pointer as it
> > > > > + * always points to an 8-byte memory region in BPF stack.
> > > > > + */
> > > > > +__bpf_kfunc void bpf_local_irq_save(unsigned long *flags__irq_flag)
> > > > 
> > > > Nit: 'unsigned long long' is guaranteed to be at-least 64 bit.
> > > >      What would go wrong if 'u64' is used here?
> > > 
> > > It goes like this:
> > > If I make this unsigned long long * or u64 *, the kfunc emitted to
> > > vmlinux.h expects a pointer of that type.
> > > Typically, kernel code is always passing unsigned long flags to these
> > > functions, and that's what people are used to.
> > > Given for --target=bpf unsigned long * is always a 8-byte value, I
> > > just did this, so that in kernels that are 32-bit,
> > > we don't end up relying on unsigned long still being 8 when
> > > fetching/storing flags on BPF stack.
> > 
> > So, the goal is to enable the following pattern:
> > 
> >   unsigned long flags;
> >   bpf_local_irq_save(&flags);
> > 
> > Right?
> > 
> > For a 32-bit system 'flags' would be 4 bytes long.
> > Consider the following example:
> > 
> >   unsigned long flags; // assume 'flags' and 'foo'
> >   int foo;             // are allocated sequentially.
> > 
> >   bpf_local_irq_save(&flags);
> > 
> > I think that in such case '*ptr = flags;' would overwrite foo.
> 
> In the kernel or userspace, yes, but I'm assuming unsigned long will
> always be 64-bit for target=BPF.
> Would that be incorrect? This pattern will only happen within BPF programs.

Discussed off-list.
Kumar is right, and there is no problem, as on BPF side 'unsigned
long' is always 8 bytes.

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore}
  2024-11-21 22:06     ` Kumar Kartikeya Dwivedi
  2024-11-21 23:08       ` Eduard Zingerman
@ 2024-11-22  0:32       ` Alexei Starovoitov
  2024-11-22  0:42         ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2024-11-22  0:32 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Eduard Zingerman, bpf, kkd, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Kernel Team

On Thu, Nov 21, 2024 at 2:07 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Thu, 21 Nov 2024 at 21:21, Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> > > Teach the verifier about IRQ-disabled sections through the introduction
> > > of two new kfuncs, bpf_local_irq_save, to save IRQ state and disable
> > > them, and bpf_local_irq_restore, to restore IRQ state and enable them
> > > back again.
> > >
> > > For the purposes of tracking the saved IRQ state, the verifier is taught
> > > about a new special object on the stack of type STACK_IRQ_FLAG. This is
> > > a 8 byte value which saves the IRQ flags which are to be passed back to
> > > the IRQ restore kfunc.
> > >
> > > To track a dynamic number of IRQ-disabled regions and their associated
> > > saved states, a new resource type RES_TYPE_IRQ is introduced, which its
> > > state management functions: acquire_irq_state and release_irq_state,
> > > taking advantage of the refactoring and clean ups made in earlier
> > > commits.
> > >
> > > One notable requirement of the kernel's IRQ save and restore API is that
> > > they cannot happen out of order. For this purpose, resource state is
> > > extended with a new type-specific member 'prev_id'. This is used to
> > > remember the ordering of acquisitions of IRQ saved states, so that we
> > > maintain a logical stack in acquisition order of resource identities,
> > > and can enforce LIFO ordering when restoring IRQ state. The top of the
> > > stack is maintained using bpf_func_state's active_irq_id.
> > >
> > > The logic to detect initialized and unitialized irq flag slots, marking
> > > and unmarking is similar to how it's done for iterators. We do need to
> > > update ressafe to perform check_ids based satisfiability check, and
> > > additionally match prev_id for RES_TYPE_IRQ entries in the resource
> > > array.
> > >
> > > The kfuncs themselves are plain wrappers over local_irq_save and
> > > local_irq_restore macros.
> > >
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> >
> > I think this matches what is done for iterators and dynptrs.
> >
> > Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> >
> > [...]
> >
> > > @@ -263,10 +267,16 @@ struct bpf_resource_state {
> > >        * is used purely to inform the user of a resource leak.
> > >        */
> > >       int insn_idx;
> > > -     /* Use to keep track of the source object of a lock, to ensure
> > > -      * it matches on unlock.
> > > -      */
> > > -     void *ptr;
> > > +     union {
> > > +             /* Use to keep track of the source object of a lock, to ensure
> > > +              * it matches on unlock.
> > > +              */
> > > +             void *ptr;
> > > +             /* Track the reference id preceding the IRQ entry in acquisition
> > > +              * order, to enforce an ordering on the release.
> > > +              */
> > > +             int prev_id;
> > > +     };
> >
> > Nit:  Do we anticipate any other resource kinds that would need LIFO acquire/release?
> >       If we do, an alternative to prev_id would be to organize bpf_func_state->res as
> >       a stack (by changing erase_resource_state() implementation).
>
> I don't think so, this was the weird case requiring such an ordering,
> so I tried to find the least intrusive way.

Acquire_refs is already a stack.
Manual push/pop via prev_id looks unnecessary.
Just search the top of acquired_refs for id.
If it doesn't match error.

I don't like this bit either:
+ if (id != state->active_irq_id)
+               return -EPROTO;


Why invent new error codes for such conditions?
It's EACESS or EINVAL like everywhere else in the verifier.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore}
  2024-11-22  0:32       ` Alexei Starovoitov
@ 2024-11-22  0:42         ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-22  0:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eduard Zingerman, bpf, kkd, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Kernel Team

On Fri, 22 Nov 2024 at 01:32, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 21, 2024 at 2:07 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Thu, 21 Nov 2024 at 21:21, Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> > > > Teach the verifier about IRQ-disabled sections through the introduction
> > > > of two new kfuncs, bpf_local_irq_save, to save IRQ state and disable
> > > > them, and bpf_local_irq_restore, to restore IRQ state and enable them
> > > > back again.
> > > >
> > > > For the purposes of tracking the saved IRQ state, the verifier is taught
> > > > about a new special object on the stack of type STACK_IRQ_FLAG. This is
> > > > a 8 byte value which saves the IRQ flags which are to be passed back to
> > > > the IRQ restore kfunc.
> > > >
> > > > To track a dynamic number of IRQ-disabled regions and their associated
> > > > saved states, a new resource type RES_TYPE_IRQ is introduced, which its
> > > > state management functions: acquire_irq_state and release_irq_state,
> > > > taking advantage of the refactoring and clean ups made in earlier
> > > > commits.
> > > >
> > > > One notable requirement of the kernel's IRQ save and restore API is that
> > > > they cannot happen out of order. For this purpose, resource state is
> > > > extended with a new type-specific member 'prev_id'. This is used to
> > > > remember the ordering of acquisitions of IRQ saved states, so that we
> > > > maintain a logical stack in acquisition order of resource identities,
> > > > and can enforce LIFO ordering when restoring IRQ state. The top of the
> > > > stack is maintained using bpf_func_state's active_irq_id.
> > > >
> > > > The logic to detect initialized and unitialized irq flag slots, marking
> > > > and unmarking is similar to how it's done for iterators. We do need to
> > > > update ressafe to perform check_ids based satisfiability check, and
> > > > additionally match prev_id for RES_TYPE_IRQ entries in the resource
> > > > array.
> > > >
> > > > The kfuncs themselves are plain wrappers over local_irq_save and
> > > > local_irq_restore macros.
> > > >
> > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > ---
> > >
> > > I think this matches what is done for iterators and dynptrs.
> > >
> > > Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> > >
> > > [...]
> > >
> > > > @@ -263,10 +267,16 @@ struct bpf_resource_state {
> > > >        * is used purely to inform the user of a resource leak.
> > > >        */
> > > >       int insn_idx;
> > > > -     /* Use to keep track of the source object of a lock, to ensure
> > > > -      * it matches on unlock.
> > > > -      */
> > > > -     void *ptr;
> > > > +     union {
> > > > +             /* Use to keep track of the source object of a lock, to ensure
> > > > +              * it matches on unlock.
> > > > +              */
> > > > +             void *ptr;
> > > > +             /* Track the reference id preceding the IRQ entry in acquisition
> > > > +              * order, to enforce an ordering on the release.
> > > > +              */
> > > > +             int prev_id;
> > > > +     };
> > >
> > > Nit:  Do we anticipate any other resource kinds that would need LIFO acquire/release?
> > >       If we do, an alternative to prev_id would be to organize bpf_func_state->res as
> > >       a stack (by changing erase_resource_state() implementation).
> >
> > I don't think so, this was the weird case requiring such an ordering,
> > so I tried to find the least intrusive way.
>
> Acquire_refs is already a stack.
> Manual push/pop via prev_id looks unnecessary.
> Just search the top of acquired_refs for id.
> If it doesn't match error.

Ok.

> I don't like this bit either:
> + if (id != state->active_irq_id)
> +               return -EPROTO;
>
>
> Why invent new error codes for such conditions?
> It's EACESS or EINVAL like everywhere else in the verifier.

Ok, though we do use EPROTO in a bunch of places. I just needed
something to distinguish the two errors.
I'll switch to these.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore}
  2024-11-21  0:53 ` [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore} Kumar Kartikeya Dwivedi
  2024-11-21 20:21   ` Eduard Zingerman
@ 2024-11-21 22:46   ` kernel test robot
  1 sibling, 0 replies; 30+ messages in thread
From: kernel test robot @ 2024-11-21 22:46 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: oe-kbuild-all, kkd, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, kernel-team

Hi Kumar,

kernel test robot noticed the following build warnings:

[auto build test WARNING on bpf-next/for-next]
[also build test WARNING on bpf-next/master linus/master next-20241121]
[cannot apply to bpf/master v6.12]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Kumar-Kartikeya-Dwivedi/bpf-Refactor-and-rename-resource-management/20241121-140722
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git for-next
patch link:    https://lore.kernel.org/r/20241121005329.408873-6-memxor%40gmail.com
patch subject: [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore}
config: arc-randconfig-001-20241122 (https://download.01.org/0day-ci/archive/20241122/202411220652.mArtMRmI-lkp@intel.com/config)
compiler: arc-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241122/202411220652.mArtMRmI-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411220652.mArtMRmI-lkp@intel.com/

All warnings (new ones prefixed by >>):

   kernel/bpf/verifier.c: In function 'process_irq_flag':
>> kernel/bpf/verifier.c:11964:32: warning: variable 'irq_restore' set but not used [-Wunused-but-set-variable]
   11964 |         bool irq_save = false, irq_restore = false;
         |                                ^~~~~~~~~~~


vim +/irq_restore +11964 kernel/bpf/verifier.c

 11959	
 11960	static int process_irq_flag(struct bpf_verifier_env *env, int regno,
 11961				     struct bpf_kfunc_call_arg_meta *meta)
 11962	{
 11963		struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
 11964		bool irq_save = false, irq_restore = false;
 11965		int err;
 11966	
 11967		if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save]) {
 11968			irq_save = true;
 11969		} else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore]) {
 11970			irq_restore = true;
 11971		} else {
 11972			verbose(env, "verifier internal error: unknown irq flags kfunc\n");
 11973			return -EFAULT;
 11974		}
 11975	
 11976		if (irq_save) {
 11977			if (!is_irq_flag_reg_valid_uninit(env, reg)) {
 11978				verbose(env, "expected uninitialized irq flag as arg#%d\n", regno);
 11979				return -EINVAL;
 11980			}
 11981	
 11982			err = check_mem_access(env, env->insn_idx, regno, 0, BPF_DW, BPF_WRITE, -1, false, false);
 11983			if (err)
 11984				return err;
 11985	
 11986			err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx);
 11987			if (err)
 11988				return err;
 11989		} else {
 11990			err = is_irq_flag_reg_valid_init(env, reg);
 11991			if (err) {
 11992				verbose(env, "expected an initialized irq flag as arg#%d\n", regno);
 11993				return err;
 11994			}
 11995	
 11996			err = mark_irq_flag_read(env, reg);
 11997			if (err)
 11998				return err;
 11999	
 12000			err = unmark_stack_slot_irq_flag(env, reg);
 12001			if (err)
 12002				return err;
 12003		}
 12004		return 0;
 12005	}
 12006	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v1 6/7] selftests/bpf: Expand coverage of preempt tests to sleepable kfunc
  2024-11-21  0:53 [PATCH bpf-next v1 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
                   ` (4 preceding siblings ...)
  2024-11-21  0:53 ` [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore} Kumar Kartikeya Dwivedi
@ 2024-11-21  0:53 ` Kumar Kartikeya Dwivedi
  2024-11-21 20:23   ` Eduard Zingerman
  2024-11-21  0:53 ` [PATCH bpf-next v1 7/7] selftests/bpf: Add IRQ save/restore tests Kumar Kartikeya Dwivedi
  6 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-21  0:53 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kernel-team

For preemption-related kfuncs, we don't test their interaction with
sleepable kfuncs (we do test helpers) even though the verifier has
code to protect against such a pattern. Expand coverage of the selftest
to include this case.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/testing/selftests/bpf/progs/preempt_lock.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/preempt_lock.c b/tools/testing/selftests/bpf/progs/preempt_lock.c
index 885377e83607..d24314c394c7 100644
--- a/tools/testing/selftests/bpf/progs/preempt_lock.c
+++ b/tools/testing/selftests/bpf/progs/preempt_lock.c
@@ -113,6 +113,18 @@ int preempt_sleepable_helper(void *ctx)
 	return 0;
 }
 
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("kernel func bpf_copy_from_user_str is sleepable within non-preemptible region")
+int preempt_sleepable_kfunc(void *ctx)
+{
+	u32 data;
+
+	bpf_preempt_disable();
+	bpf_copy_from_user_str(&data, sizeof(data), NULL, 0);
+	bpf_preempt_enable();
+	return 0;
+}
+
 int __noinline preempt_global_subprog(void)
 {
 	preempt_balance_subprog();
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 6/7] selftests/bpf: Expand coverage of preempt tests to sleepable kfunc
  2024-11-21  0:53 ` [PATCH bpf-next v1 6/7] selftests/bpf: Expand coverage of preempt tests to sleepable kfunc Kumar Kartikeya Dwivedi
@ 2024-11-21 20:23   ` Eduard Zingerman
  0 siblings, 0 replies; 30+ messages in thread
From: Eduard Zingerman @ 2024-11-21 20:23 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> For preemption-related kfuncs, we don't test their interaction with
> sleepable kfuncs (we do test helpers) even though the verifier has
> code to protect against such a pattern. Expand coverage of the selftest
> to include this case.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v1 7/7] selftests/bpf: Add IRQ save/restore tests
  2024-11-21  0:53 [PATCH bpf-next v1 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
                   ` (5 preceding siblings ...)
  2024-11-21  0:53 ` [PATCH bpf-next v1 6/7] selftests/bpf: Expand coverage of preempt tests to sleepable kfunc Kumar Kartikeya Dwivedi
@ 2024-11-21  0:53 ` Kumar Kartikeya Dwivedi
  2024-11-21 20:43   ` Eduard Zingerman
  6 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-21  0:53 UTC (permalink / raw)
  To: bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kernel-team

Include tests that check for rejection in erroneous cases, like
unbalanced IRQ-disabled counts, within and across subprogs, invalid IRQ
flag state or input to kfuncs, behavior upon overwriting IRQ saved state
on stack, interaction with sleepable kfuncs/helpers, global functions,
and out of order restore. Include some success scenarios as well to
demonstrate usage.

#123/1   irq/irq_restore_missing_1:OK
#123/2   irq/irq_restore_missing_2:OK
#123/3   irq/irq_restore_missing_3:OK
#123/4   irq/irq_restore_missing_3_minus_2:OK
#123/5   irq/irq_restore_missing_1_subprog:OK
#123/6   irq/irq_restore_missing_2_subprog:OK
#123/7   irq/irq_restore_missing_3_subprog:OK
#123/8   irq/irq_restore_missing_3_minus_2_subprog:OK
#123/9   irq/irq_balance:OK
#123/10  irq/irq_balance_n:OK
#123/11  irq/irq_balance_subprog:OK
#123/12  irq/irq_balance_n_subprog:OK
#123/13  irq/irq_global_subprog:OK
#123/14  irq/irq_restore_ooo:OK
#123/15  irq/irq_restore_ooo_3:OK
#123/16  irq/irq_restore_3_subprog:OK
#123/17  irq/irq_restore_4_subprog:OK
#123/18  irq/irq_restore_ooo_3_subprog:OK
#123/19  irq/irq_restore_invalid:OK
#123/20  irq/irq_save_invalid:OK
#123/21  irq/irq_restore_iter:OK
#123/22  irq/irq_save_iter:OK
#123/23  irq/irq_flag_overwrite:OK
#123/24  irq/irq_flag_overwrite_partial:OK
#123/25  irq/irq_sleepable_helper:OK
#123/26  irq/irq_sleepable_kfunc:OK
#123     irq:OK
Summary: 1/26 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/testing/selftests/bpf/prog_tests/irq.c |   9 +
 tools/testing/selftests/bpf/progs/irq.c      | 393 +++++++++++++++++++
 2 files changed, 402 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/irq.c
 create mode 100644 tools/testing/selftests/bpf/progs/irq.c

diff --git a/tools/testing/selftests/bpf/prog_tests/irq.c b/tools/testing/selftests/bpf/prog_tests/irq.c
new file mode 100644
index 000000000000..496f4826ac37
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/irq.c
@@ -0,0 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+#include <irq.skel.h>
+
+void test_irq(void)
+{
+	RUN_TESTS(irq);
+}
diff --git a/tools/testing/selftests/bpf/progs/irq.c b/tools/testing/selftests/bpf/progs/irq.c
new file mode 100644
index 000000000000..5301b66fc752
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/irq.c
@@ -0,0 +1,393 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_1(struct __sk_buff *ctx)
+{
+	unsigned long flags;
+
+	bpf_local_irq_save(&flags);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_2(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+
+	bpf_local_irq_save(&flags1);
+	bpf_local_irq_save(&flags2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_3(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	bpf_local_irq_save(&flags1);
+	bpf_local_irq_save(&flags2);
+	bpf_local_irq_save(&flags3);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_3_minus_2(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	bpf_local_irq_save(&flags1);
+	bpf_local_irq_save(&flags2);
+	bpf_local_irq_save(&flags3);
+	bpf_local_irq_restore(&flags3);
+	bpf_local_irq_restore(&flags2);
+	return 0;
+}
+
+static __noinline void local_irq_save(unsigned long *flags)
+{
+	bpf_local_irq_save(flags);
+}
+
+static __noinline void local_irq_restore(unsigned long *flags)
+{
+	bpf_local_irq_restore(flags);
+}
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_1_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags;
+
+	local_irq_save(&flags);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_2_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+
+	local_irq_save(&flags1);
+	local_irq_save(&flags2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_3_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	local_irq_save(&flags1);
+	local_irq_save(&flags2);
+	local_irq_save(&flags3);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("BPF_EXIT instruction cannot be used inside bpf_local_irq_save-ed region")
+int irq_restore_missing_3_minus_2_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	local_irq_save(&flags1);
+	local_irq_save(&flags2);
+	local_irq_save(&flags3);
+	local_irq_restore(&flags3);
+	local_irq_restore(&flags2);
+	return 0;
+}
+
+SEC("?tc")
+__success
+int irq_balance(struct __sk_buff *ctx)
+{
+	unsigned long flags;
+
+	local_irq_save(&flags);
+	local_irq_restore(&flags);
+	return 0;
+}
+
+SEC("?tc")
+__success
+int irq_balance_n(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	local_irq_save(&flags1);
+	local_irq_save(&flags2);
+	local_irq_save(&flags3);
+	local_irq_restore(&flags3);
+	local_irq_restore(&flags2);
+	local_irq_restore(&flags1);
+	return 0;
+}
+
+static __noinline void local_irq_balance(void)
+{
+	unsigned long flags;
+
+	local_irq_save(&flags);
+	local_irq_restore(&flags);
+}
+
+static __noinline void local_irq_balance_n(void)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	local_irq_save(&flags1);
+	local_irq_save(&flags2);
+	local_irq_save(&flags3);
+	local_irq_restore(&flags3);
+	local_irq_restore(&flags2);
+	local_irq_restore(&flags1);
+}
+
+SEC("?tc")
+__success
+int irq_balance_subprog(struct __sk_buff *ctx)
+{
+	local_irq_balance();
+	return 0;
+}
+
+SEC("?tc")
+__success
+int irq_balance_n_subprog(struct __sk_buff *ctx)
+{
+	local_irq_balance_n();
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("sleepable helper bpf_copy_from_user#")
+int irq_sleepable_helper(void *ctx)
+{
+	unsigned long flags;
+	u32 data;
+
+	local_irq_save(&flags);
+	bpf_copy_from_user(&data, sizeof(data), NULL);
+	local_irq_restore(&flags);
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("kernel func bpf_copy_from_user_str is sleepable within IRQ-disabled region")
+int irq_sleepable_kfunc(void *ctx)
+{
+	unsigned long flags;
+	u32 data;
+
+	local_irq_save(&flags);
+	bpf_copy_from_user_str(&data, sizeof(data), NULL, 0);
+	local_irq_restore(&flags);
+	return 0;
+}
+
+int __noinline global_local_irq_balance(void)
+{
+	local_irq_balance_n();
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("global function calls are not allowed with IRQs disabled")
+int irq_global_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags;
+
+	bpf_local_irq_save(&flags);
+	global_local_irq_balance();
+	bpf_local_irq_restore(&flags);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("cannot restore irq state out of order")
+int irq_restore_ooo(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+
+	bpf_local_irq_save(&flags1);
+	bpf_local_irq_save(&flags2);
+	bpf_local_irq_restore(&flags1);
+	bpf_local_irq_restore(&flags2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("cannot restore irq state out of order")
+int irq_restore_ooo_3(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	bpf_local_irq_save(&flags1);
+	bpf_local_irq_save(&flags2);
+	bpf_local_irq_restore(&flags2);
+	bpf_local_irq_save(&flags3);
+	bpf_local_irq_restore(&flags1);
+	bpf_local_irq_restore(&flags3);
+	return 0;
+}
+
+static __noinline void local_irq_save_3(unsigned long *flags1, unsigned long *flags2,
+					unsigned long *flags3)
+{
+	local_irq_save(flags1);
+	local_irq_save(flags2);
+	local_irq_save(flags3);
+}
+
+SEC("?tc")
+__success
+int irq_restore_3_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	local_irq_save_3(&flags1, &flags2, &flags3);
+	bpf_local_irq_restore(&flags3);
+	bpf_local_irq_restore(&flags2);
+	bpf_local_irq_restore(&flags1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("cannot restore irq state out of order")
+int irq_restore_4_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+	unsigned long flags4;
+
+	local_irq_save_3(&flags1, &flags2, &flags3);
+	bpf_local_irq_restore(&flags3);
+	bpf_local_irq_save(&flags4);
+	bpf_local_irq_restore(&flags4);
+	bpf_local_irq_restore(&flags1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("cannot restore irq state out of order")
+int irq_restore_ooo_3_subprog(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags2;
+	unsigned long flags3;
+
+	local_irq_save_3(&flags1, &flags2, &flags3);
+	bpf_local_irq_restore(&flags3);
+	bpf_local_irq_restore(&flags2);
+	bpf_local_irq_save(&flags3);
+	bpf_local_irq_restore(&flags1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("expected an initialized")
+int irq_restore_invalid(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+	unsigned long flags = 0xfaceb00c;
+
+	bpf_local_irq_save(&flags1);
+	bpf_local_irq_restore(&flags);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("expected uninitialized")
+int irq_save_invalid(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+
+	bpf_local_irq_save(&flags1);
+	bpf_local_irq_save(&flags1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("expected an initialized")
+int irq_restore_iter(struct __sk_buff *ctx)
+{
+	struct bpf_iter_num it;
+
+	bpf_iter_num_new(&it, 0, 42);
+	bpf_local_irq_restore((unsigned long *)&it);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("Unreleased reference id=1")
+int irq_save_iter(struct __sk_buff *ctx)
+{
+	struct bpf_iter_num it;
+
+	/* Ensure same sized slot has st->ref_obj_id set, so we reject based on
+	 * slot_type != STACK_IRQ_FLAG...
+	 */
+	_Static_assert(sizeof(it) == sizeof(unsigned long), "broken iterator size");
+
+	bpf_iter_num_new(&it, 0, 42);
+	bpf_local_irq_save((unsigned long *)&it);
+	bpf_local_irq_restore((unsigned long *)&it);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("expected an initialized")
+int irq_flag_overwrite(struct __sk_buff *ctx)
+{
+	unsigned long flags;
+
+	bpf_local_irq_save(&flags);
+	flags = 0xdeadbeef;
+	bpf_local_irq_restore(&flags);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("expected an initialized")
+int irq_flag_overwrite_partial(struct __sk_buff *ctx)
+{
+	unsigned long flags;
+
+	bpf_local_irq_save(&flags);
+	*(((char *)&flags) + 1) = 0xff;
+	bpf_local_irq_restore(&flags);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 7/7] selftests/bpf: Add IRQ save/restore tests
  2024-11-21  0:53 ` [PATCH bpf-next v1 7/7] selftests/bpf: Add IRQ save/restore tests Kumar Kartikeya Dwivedi
@ 2024-11-21 20:43   ` Eduard Zingerman
  2024-11-21 22:07     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Eduard Zingerman @ 2024-11-21 20:43 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> Include tests that check for rejection in erroneous cases, like
> unbalanced IRQ-disabled counts, within and across subprogs, invalid IRQ
> flag state or input to kfuncs, behavior upon overwriting IRQ saved state
> on stack, interaction with sleepable kfuncs/helpers, global functions,
> and out of order restore. Include some success scenarios as well to
> demonstrate usage.
> 
> #123/1   irq/irq_restore_missing_1:OK
> #123/2   irq/irq_restore_missing_2:OK
> #123/3   irq/irq_restore_missing_3:OK
> #123/4   irq/irq_restore_missing_3_minus_2:OK
> #123/5   irq/irq_restore_missing_1_subprog:OK
> #123/6   irq/irq_restore_missing_2_subprog:OK
> #123/7   irq/irq_restore_missing_3_subprog:OK
> #123/8   irq/irq_restore_missing_3_minus_2_subprog:OK
> #123/9   irq/irq_balance:OK
> #123/10  irq/irq_balance_n:OK
> #123/11  irq/irq_balance_subprog:OK
> #123/12  irq/irq_balance_n_subprog:OK
> #123/13  irq/irq_global_subprog:OK
> #123/14  irq/irq_restore_ooo:OK
> #123/15  irq/irq_restore_ooo_3:OK
> #123/16  irq/irq_restore_3_subprog:OK
> #123/17  irq/irq_restore_4_subprog:OK
> #123/18  irq/irq_restore_ooo_3_subprog:OK
> #123/19  irq/irq_restore_invalid:OK
> #123/20  irq/irq_save_invalid:OK
> #123/21  irq/irq_restore_iter:OK
> #123/22  irq/irq_save_iter:OK
> #123/23  irq/irq_flag_overwrite:OK
> #123/24  irq/irq_flag_overwrite_partial:OK
> #123/25  irq/irq_sleepable_helper:OK
> #123/26  irq/irq_sleepable_kfunc:OK
> #123     irq:OK
> Summary: 1/26 PASSED, 0 SKIPPED, 0 FAILED
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

The following error condition is not tested:
"arg#%d doesn't point to an irq flag on stack".
Also, I think a few tests are excessive.
Otherwise looks good.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> diff --git a/tools/testing/selftests/bpf/prog_tests/irq.c b/tools/testing/selftests/bpf/prog_tests/irq.c
> new file mode 100644
> index 000000000000..496f4826ac37
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/irq.c
> @@ -0,0 +1,9 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
> +#include <test_progs.h>
> +#include <irq.skel.h>
> +
> +void test_irq(void)
> +{
> +	RUN_TESTS(irq);
> +}

Nit: tools/testing/selftests/bpf/prog_tests/verifier.c
     could be used instead of a separate file.

> diff --git a/tools/testing/selftests/bpf/progs/irq.c b/tools/testing/selftests/bpf/progs/irq.c
> new file mode 100644
> index 000000000000..5301b66fc752
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/irq.c
> @@ -0,0 +1,393 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
> +#include <vmlinux.h>
> +#include <bpf/bpf_helpers.h>
> +#include "bpf_misc.h"
> +
> +SEC("?tc")
> +__failure __msg("BPF_EXIT instruction cannot be used inside bpf_local_irq_save-ed region")

Nit: I know this is not a fault of this series, but the error message
     is sort of confusing. BPF_EXIT is allowed for irq saved region,
     just it has to be an exit from a sub-program, not a whole program.

> +int irq_restore_missing_1(struct __sk_buff *ctx)
> +{
> +	unsigned long flags;
> +
> +	bpf_local_irq_save(&flags);
> +	return 0;
> +}

[...]

Nit: don't think this test adds much compared to irq_restore_missing_2.

> +{
> +	unsigned long flags1;
> +	unsigned long flags2;
> +	unsigned long flags3;
> +
> +	bpf_local_irq_save(&flags1);
> +	bpf_local_irq_save(&flags2);
> +	bpf_local_irq_save(&flags3);
> +	return 0;
> +}

[...]

> +SEC("?tc")
> +__success
> +int irq_balance_n_subprog(struct __sk_buff *ctx)

Nit: don't think this test adds much given irq_balance_n()
     and irq_balance_subprog().

> +{
> +	local_irq_balance_n();
> +	return 0;
> +}

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 7/7] selftests/bpf: Add IRQ save/restore tests
  2024-11-21 20:43   ` Eduard Zingerman
@ 2024-11-21 22:07     ` Kumar Kartikeya Dwivedi
  2024-11-21 23:09       ` Eduard Zingerman
  0 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-11-21 22:07 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 21 Nov 2024 at 21:43, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> > Include tests that check for rejection in erroneous cases, like
> > unbalanced IRQ-disabled counts, within and across subprogs, invalid IRQ
> > flag state or input to kfuncs, behavior upon overwriting IRQ saved state
> > on stack, interaction with sleepable kfuncs/helpers, global functions,
> > and out of order restore. Include some success scenarios as well to
> > demonstrate usage.
> >
> > #123/1   irq/irq_restore_missing_1:OK
> > #123/2   irq/irq_restore_missing_2:OK
> > #123/3   irq/irq_restore_missing_3:OK
> > #123/4   irq/irq_restore_missing_3_minus_2:OK
> > #123/5   irq/irq_restore_missing_1_subprog:OK
> > #123/6   irq/irq_restore_missing_2_subprog:OK
> > #123/7   irq/irq_restore_missing_3_subprog:OK
> > #123/8   irq/irq_restore_missing_3_minus_2_subprog:OK
> > #123/9   irq/irq_balance:OK
> > #123/10  irq/irq_balance_n:OK
> > #123/11  irq/irq_balance_subprog:OK
> > #123/12  irq/irq_balance_n_subprog:OK
> > #123/13  irq/irq_global_subprog:OK
> > #123/14  irq/irq_restore_ooo:OK
> > #123/15  irq/irq_restore_ooo_3:OK
> > #123/16  irq/irq_restore_3_subprog:OK
> > #123/17  irq/irq_restore_4_subprog:OK
> > #123/18  irq/irq_restore_ooo_3_subprog:OK
> > #123/19  irq/irq_restore_invalid:OK
> > #123/20  irq/irq_save_invalid:OK
> > #123/21  irq/irq_restore_iter:OK
> > #123/22  irq/irq_save_iter:OK
> > #123/23  irq/irq_flag_overwrite:OK
> > #123/24  irq/irq_flag_overwrite_partial:OK
> > #123/25  irq/irq_sleepable_helper:OK
> > #123/26  irq/irq_sleepable_kfunc:OK
> > #123     irq:OK
> > Summary: 1/26 PASSED, 0 SKIPPED, 0 FAILED
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> The following error condition is not tested:
> "arg#%d doesn't point to an irq flag on stack".
> Also, I think a few tests are excessive.
> Otherwise looks good.
>

Will add.

> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
> > diff --git a/tools/testing/selftests/bpf/prog_tests/irq.c b/tools/testing/selftests/bpf/prog_tests/irq.c
> > new file mode 100644
> > index 000000000000..496f4826ac37
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/irq.c
> > @@ -0,0 +1,9 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
> > +#include <test_progs.h>
> > +#include <irq.skel.h>
> > +
> > +void test_irq(void)
> > +{
> > +     RUN_TESTS(irq);
> > +}
>
> Nit: tools/testing/selftests/bpf/prog_tests/verifier.c
>      could be used instead of a separate file.
>
> > diff --git a/tools/testing/selftests/bpf/progs/irq.c b/tools/testing/selftests/bpf/progs/irq.c
> > new file mode 100644
> > index 000000000000..5301b66fc752
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/irq.c
> > @@ -0,0 +1,393 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
> > +#include <vmlinux.h>
> > +#include <bpf/bpf_helpers.h>
> > +#include "bpf_misc.h"
> > +
> > +SEC("?tc")
> > +__failure __msg("BPF_EXIT instruction cannot be used inside bpf_local_irq_save-ed region")
>
> Nit: I know this is not a fault of this series, but the error message
>      is sort of confusing. BPF_EXIT is allowed for irq saved region,
>      just it has to be an exit from a sub-program, not a whole program.
>
> > +int irq_restore_missing_1(struct __sk_buff *ctx)
> > +{
> > +     unsigned long flags;
> > +
> > +     bpf_local_irq_save(&flags);
> > +     return 0;
> > +}
>
> [...]
>
> Nit: don't think this test adds much compared to irq_restore_missing_2.
>
> > +{
> > +     unsigned long flags1;
> > +     unsigned long flags2;
> > +     unsigned long flags3;
> > +
> > +     bpf_local_irq_save(&flags1);
> > +     bpf_local_irq_save(&flags2);
> > +     bpf_local_irq_save(&flags3);
> > +     return 0;
> > +}
>
> [...]
>
> > +SEC("?tc")
> > +__success
> > +int irq_balance_n_subprog(struct __sk_buff *ctx)
>
> Nit: don't think this test adds much given irq_balance_n()
>      and irq_balance_subprog().

My idea with both of these was to ensure when the state is copied in
and out on calls and when we're doing one or more than one
save/restore (which links prev_id into active_irq_id etc.) we don't
have problems, so they were definitely testing different scenarios.
But with the move into bpf_verifier_state they will indeed become
redundant, so I'm going to drop them in v2.

>
> > +{
> > +     local_irq_balance_n();
> > +     return 0;
> > +}
>
> [...]
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v1 7/7] selftests/bpf: Add IRQ save/restore tests
  2024-11-21 22:07     ` Kumar Kartikeya Dwivedi
@ 2024-11-21 23:09       ` Eduard Zingerman
  0 siblings, 0 replies; 30+ messages in thread
From: Eduard Zingerman @ 2024-11-21 23:09 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, kkd, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, kernel-team

On Thu, 2024-11-21 at 23:07 +0100, Kumar Kartikeya Dwivedi wrote:

[...]

> > > +SEC("?tc")
> > > +__success
> > > +int irq_balance_n_subprog(struct __sk_buff *ctx)
> > 
> > Nit: don't think this test adds much given irq_balance_n()
> >      and irq_balance_subprog().
> 
> My idea with both of these was to ensure when the state is copied in
> and out on calls and when we're doing one or more than one
> save/restore (which links prev_id into active_irq_id etc.) we don't
> have problems, so they were definitely testing different scenarios.
> But with the move into bpf_verifier_state they will indeed become
> redundant, so I'm going to drop them in v2.

Understood, thank you for explaining.


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-11-22  0:42 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-21  0:53 [PATCH bpf-next v1 0/7] IRQ save/restore Kumar Kartikeya Dwivedi
2024-11-21  0:53 ` [PATCH bpf-next v1 1/7] bpf: Refactor and rename resource management Kumar Kartikeya Dwivedi
2024-11-21 16:57   ` Eduard Zingerman
2024-11-21 17:17     ` Kumar Kartikeya Dwivedi
2024-11-22  0:24   ` Alexei Starovoitov
2024-11-22  0:31     ` Kumar Kartikeya Dwivedi
2024-11-21  0:53 ` [PATCH bpf-next v1 2/7] bpf: Be consistent between {acquire,find,release}_lock_state Kumar Kartikeya Dwivedi
2024-11-21 17:54   ` Eduard Zingerman
2024-11-21  0:53 ` [PATCH bpf-next v1 3/7] bpf: Consolidate RCU and preempt locks in bpf_func_state Kumar Kartikeya Dwivedi
2024-11-21 18:09   ` Eduard Zingerman
2024-11-21 18:12     ` Kumar Kartikeya Dwivedi
2024-11-21 18:54       ` Eduard Zingerman
2024-11-21 22:04         ` Kumar Kartikeya Dwivedi
2024-11-21  0:53 ` [PATCH bpf-next v1 4/7] bpf: Refactor mark_{dynptr,iter}_read Kumar Kartikeya Dwivedi
2024-11-21 18:00   ` Eduard Zingerman
2024-11-21  0:53 ` [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore} Kumar Kartikeya Dwivedi
2024-11-21 20:21   ` Eduard Zingerman
2024-11-21 22:06     ` Kumar Kartikeya Dwivedi
2024-11-21 23:08       ` Eduard Zingerman
2024-11-21 23:12         ` Kumar Kartikeya Dwivedi
2024-11-22  0:30           ` Eduard Zingerman
2024-11-22  0:32       ` Alexei Starovoitov
2024-11-22  0:42         ` Kumar Kartikeya Dwivedi
2024-11-21 22:46   ` kernel test robot
2024-11-21  0:53 ` [PATCH bpf-next v1 6/7] selftests/bpf: Expand coverage of preempt tests to sleepable kfunc Kumar Kartikeya Dwivedi
2024-11-21 20:23   ` Eduard Zingerman
2024-11-21  0:53 ` [PATCH bpf-next v1 7/7] selftests/bpf: Add IRQ save/restore tests Kumar Kartikeya Dwivedi
2024-11-21 20:43   ` Eduard Zingerman
2024-11-21 22:07     ` Kumar Kartikeya Dwivedi
2024-11-21 23:09       ` Eduard Zingerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox