[PATCH bpf v5 0/8] Introduce KF_FORBID_FAULT modifier for acquire/release kfuncs

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf v5 0/8] Introduce KF_FORBID_FAULT modifier for acquire/release kfuncs
@ 2026-02-26 16:14 Puranjay Mohan
  2026-02-26 16:14 ` [PATCH bpf v5 1/8] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators Puranjay Mohan
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Puranjay Mohan @ 2026-02-26 16:14 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	kernel-team

Changelog:

v4: https://lore.kernel.org/all/20260224212535.1165151-1-puranjay@kernel.org/
Changes in v4 -> v5:
- Base the commits over bpf/master and not bpf-next/master
- Rename KF_FORBID_SLEEP to KF_FORBID_FAULT: mmap_lock is a sleeping
  lock (rw_semaphore), so the actual constraint is about faulting (which
  would deadlock on mmap_lock re-acquisition), not sleeping (Alexei)
- Rename forbid_sleep/forbid_sleep_count to forbid_fault/forbid_fault_count
  throughout verifier and headers
- Change verifier error description from "nosleep region" to
  "nofault region"
- Add new preparatory patch (patch 4) to consolidate the open-coded
  sleepable check in check_func_call() into in_sleepable_context() +
  non_sleepable_context_description(), consistent with
  check_helper_call() and check_kfunc_call()
- Update selftest error messages to match: "nofault region" and use
  #{{[0-9]+}} regex for helper IDs

v3: https://lore.kernel.org/all/20260223174659.2749964-1-puranjay@kernel.org/
Changes in v3 -> v4:
- In iter_release_acquired_ref(), drop the if (!err) guard before
  zeroing iter_st->id, the verifier stops on error anyway (Eduard)
- In process_iter_next_call() DRAINED branch, guard iter_release_acquired_ref()
  with is_kfunc_acquire() check for future-proofing (Eduard)
- In check_kfunc_call() KF_RELEASE path, validate that the stack slot is
  actually STACK_ITER before operating on it (Mykyta)
- In check_kfunc_call() KF_RELEASE path, use iter_release_acquired_ref()
  helper instead of open-coding release_reference() + id zeroing (Eduard)
- In check_kfunc_call() KF_ACQUIRE path, move is_iter_next_kfunc() check
  inside the is_kfunc_acquire() block so there is only one place checking
  is_kfunc_acquire() (Mykyta)
- Add new patch to consolidate scattered sleepable checks in
  check_kfunc_call() into a single in_sleepable_context() check (Eduard)
- Drop separate forbid_sleep_count check in check_kfunc_call(), now
  covered by the consolidated in_sleepable_context() check (Eduard)
- Use in_sleepable_context() for global subprog sleep check in
  check_func_call() instead of open-coding (Eduard)
- Add runtime test for nested task_vma iterators on the same task to
  verify mmap_read_trylock() handles concurrent readers (Alexei)

v2: https://lore.kernel.org/all/20260223160300.2109907-1-puranjay@kernel.org/
Changes in v2 -> v3:
- Rebased on bpf-next/master

v1: https://lore.kernel.org/all/20260218182555.1501495-1-puranjay@kernel.org/
Changes in v1 -> v2:
- Add a patch to consolidate sleepable context error message printing in
  check_helper_call(), has no functional changes (Eduard)
- In check_kfunc_call() for KF_RELEASE and __iter arg, use release_regno
  like dynptr rather than custom handling (Mykyta)
- Fix some comments to follow correct style (Mykyta)
- Move state->forbid_sleep_count-- to release_reference_state() (Eduard)
- Remove error message in check_resource_leak() for forbid_sleep_count
  because it is redundant and will never trigger (Eduard)
- Consolidated some checks and prints (Eduard)


Some BPF kfuncs acquire resources that prevent faulting - a lock, a
reference to an object under a lock, a preempt-disable section. Today there
is no way for the verifier to know that holding a particular acquired
reference means faulting is forbidden. Programs either run entirely in
sleepable or non-sleepable context, with no way to express "faulting is
forbidden right now, but will be allowed once I release this reference."

This series adds KF_FORBID_FAULT, a new kfunc flag that can be combined
with KF_ACQUIRE. When a kfunc annotated with KF_ACQUIRE | KF_FORBID_FAULT
is called, the verifier tags the acquired reference with forbid_fault and
increments a per-state forbid_fault_count counter. When the reference is
released (through corresponding KF_RELEASE kfunc), the counter is
decremented. The verifier checks this counter everywhere it decides
whether sleeping is allowed — the implementation conservatively blocks
all sleepable operations while faulting is forbidden.

This is fully generic. Any pair of KF_ACQUIRE/KF_RELEASE kfuncs can opt
into fault prohibition by adding KF_FORBID_FAULT to the acquire side.

To make this useful for iterators specifically, the series first extends
the verifier's iterator support to allow KF_ACQUIRE on _next and KF_RELEASE
on a separate kfunc taking an __iter argument. The verifier tracks the
acquired reference on the iterator's stack slot (st->id) and auto-releases
it on the next _next call and on the DRAINED (NULL) path, so the
acquire/release is transparent to programs that don't need mid-loop
release.

Iterator KF_ACQUIRE support is not useful on its own right now, but it
becomes the foundation for KF_FORBID_FAULT: an iterator whose _next is
annotated with KF_ACQUIRE | KF_FORBID_FAULT can now express "holding this
pointer forbids faulting; calling _release invalidates the pointer and
re-enables sleeping."

The task_vma iterator is the first user. It holds mmap_lock during
iteration, preventing sleepable helpers like bpf_copy_from_user().
Since mmap_lock is a sleeping lock (rw_semaphore), sleeping itself is
fine while holding it. The actual danger is faulting, which would try
to re-acquire mmap_lock and deadlock.

With this series, a BPF program can release the lock mid-iteration:

    bpf_for_each(task_vma, vma, task, 0) {
        u64 start = vma->vm_start;

        /* faulting forbidden, but vma pointer access allowed */
        bpf_iter_task_vma_release(&___it);
        /* mmap_lock released, vma pointer invalidated */
        /* faulting (and sleeping) is fine here */

        bpf_copy_from_user(&buf, sizeof(buf), (void *)start);
    }

The series is organized as:

  Patch 1: KF_ACQUIRE/KF_RELEASE plumbing for iterators in the
           verifier. Pure infrastructure, no behavioral change to
           existing iterators.

  Patch 2: Consolidate sleepable context error message printing
           in check_helper_call(), no functional changes.

  Patch 3: Consolidate scattered sleepable checks in check_kfunc_call()
           into a single in_sleepable_context() check.

  Patch 4: Consolidate the open-coded sleepable check in
           check_func_call() into in_sleepable_context(), consistent
           with patches 2 and 3.

  Patch 5: KF_FORBID_FAULT flag and forbid_fault_count machinery.
           Generic, works for any KF_ACQUIRE kfunc - iterator or not.

  Patch 6: Move mmap_lock acquisition from _new to _next in the
           task_vma iterator, preparing for re-acquisition after
           release.

  Patch 7: Wire up task_vma as the first user: annotate _next with
           KF_ACQUIRE | KF_FORBID_FAULT, add bpf_iter_task_vma_release().

  Patch 8: Selftests covering the runtime path (release + copy_from_user,
           nested iterators on same mm) and verifier rejection of
           invalid patterns (sleeping without release, VMA access after
           release, double release, release without acquire, nested
           iterator interaction).

Puranjay Mohan (8):
  bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators
  bpf: consolidate sleepable checks in check_helper_call()
  bpf: consolidate sleepable checks in check_kfunc_call()
  bpf: consolidate sleepable checks in check_func_call()
  bpf: Add KF_FORBID_FAULT modifier for KF_ACQUIRE kfuncs
  bpf: Move locking to bpf_iter_task_vma_next()
  bpf: Add split iteration support to task_vma iterator
  selftests/bpf: Add tests for split task_vma iterator

 include/linux/bpf_verifier.h                  |   2 +
 include/linux/btf.h                           |   1 +
 kernel/bpf/helpers.c                          |   3 +-
 kernel/bpf/task_iter.c                        |  44 +++-
 kernel/bpf/verifier.c                         | 196 ++++++++++++------
 .../testing/selftests/bpf/bpf_experimental.h  |   1 +
 .../testing/selftests/bpf/prog_tests/iters.c  |  24 +++
 .../selftests/bpf/prog_tests/summarization.c  |   2 +-
 tools/testing/selftests/bpf/progs/irq.c       |   4 +-
 .../selftests/bpf/progs/iters_task_vma.c      |  71 +++++++
 .../bpf/progs/iters_task_vma_nosleep.c        | 125 +++++++++++
 .../selftests/bpf/progs/preempt_lock.c        |   6 +-
 .../bpf/progs/verifier_async_cb_context.c     |   4 +-
 13 files changed, 399 insertions(+), 84 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task_vma_nosleep.c


base-commit: 8feedae96f872f1b74ad40c72b5cd6a47c44d9dd
-- 
2.47.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH bpf v5 1/8] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators
  2026-02-26 16:14 [PATCH bpf v5 0/8] Introduce KF_FORBID_FAULT modifier for acquire/release kfuncs Puranjay Mohan
@ 2026-02-26 16:14 ` Puranjay Mohan
  2026-02-27  0:46   ` Alexei Starovoitov
  2026-02-26 16:14 ` [PATCH bpf v5 2/8] bpf: consolidate sleepable checks in check_helper_call() Puranjay Mohan
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Puranjay Mohan @ 2026-02-26 16:14 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	kernel-team

Some iterators hold resources (like mmap_lock in task_vma) that prevent
sleeping. To allow BPF programs to release such resources mid-iteration
and call sleepable helpers, the verifier needs to track acquire/release
semantics on iterator _next pointers.

Repurpose the st->id field on STACK_ITER slots to track the ref_obj_id
of the pointer returned by _next when the kfunc is annotated with
KF_ACQUIRE. This is safe because st->id is initialized to 0 by
__mark_reg_known_zero() in mark_stack_slots_iter() and is not compared
in stacksafe() for STACK_ITER slots.

The lifecycle is:

  _next (KF_ACQUIRE):
    - auto-release old ref if st->id != 0
    - acquire new ref, store ref_obj_id in st->id
    - DRAINED branch: release via st->id, set st->id = 0
    - ACTIVE branch: keeps ref, st->id tracks it

  _release (KF_RELEASE + __iter arg):
    - read st->id, release_reference(), set st->id = 0

  _destroy:
    - release st->id if non-zero before releasing iterator's own ref

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 kernel/bpf/verifier.c | 71 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 69 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bb12ba020649..c80e96c2c271 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1038,6 +1038,8 @@ static void __mark_reg_known_zero(struct bpf_reg_state *reg);
 static bool in_rcu_cs(struct bpf_verifier_env *env);
 
 static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta);
+static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta);
+static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta);
 
 static int mark_stack_slots_iter(struct bpf_verifier_env *env,
 				 struct bpf_kfunc_call_arg_meta *meta,
@@ -1083,6 +1085,22 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
 	return 0;
 }
 
+/*
+ * Release the acquired reference tracked by iter_st->id, if any.
+ * Used during auto-release in _next, DRAINED handling, and _destroy.
+ */
+static int iter_release_acquired_ref(struct bpf_verifier_env *env,
+				     struct bpf_reg_state *iter_st)
+{
+	int err;
+
+	if (!iter_st->id)
+		return 0;
+	err = release_reference(env, iter_st->id);
+	iter_st->id = 0;
+	return err;
+}
+
 static int unmark_stack_slots_iter(struct bpf_verifier_env *env,
 				   struct bpf_reg_state *reg, int nr_slots)
 {
@@ -1097,8 +1115,14 @@ static int unmark_stack_slots_iter(struct bpf_verifier_env *env,
 		struct bpf_stack_state *slot = &state->stack[spi - i];
 		struct bpf_reg_state *st = &slot->spilled_ptr;
 
-		if (i == 0)
+		if (i == 0) {
+			/*
+			 * Release any outstanding acquired ref tracked by st->id
+			 * before releasing the iterator's own ref.
+			 */
+			WARN_ON_ONCE(iter_release_acquired_ref(env, st));
 			WARN_ON_ONCE(release_reference(env, st->ref_obj_id));
+		}
 
 		__mark_reg_not_init(env, st);
 
@@ -8963,6 +8987,8 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id
 		/* remember meta->iter info for process_iter_next_call() */
 		meta->iter.spi = spi;
 		meta->iter.frameno = reg->frameno;
+		if (is_kfunc_release(meta))
+			meta->release_regno = regno;
 		meta->ref_obj_id = iter_ref_obj_id(env, reg, spi);
 
 		if (is_iter_destroy_kfunc(meta)) {
@@ -9198,6 +9224,12 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
 	/* mark current iter state as drained and assume returned NULL */
 	cur_iter->iter.state = BPF_ITER_STATE_DRAINED;
 	__mark_reg_const_zero(env, &cur_fr->regs[BPF_REG_0]);
+	/*
+	 * If _next acquired a ref (KF_ACQUIRE), release it in the DRAINED branch since NULL
+	 * was returned.
+	 */
+	if (is_kfunc_acquire(meta))
+		return iter_release_acquired_ref(env, cur_iter);
 
 	return 0;
 }
@@ -14214,6 +14246,22 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 		if (meta.initialized_dynptr.ref_obj_id) {
 			err = unmark_stack_slots_dynptr(env, reg);
+		} else if (base_type(reg->type) == PTR_TO_STACK) {
+			struct bpf_func_state *fstate;
+			struct bpf_reg_state *iter_st;
+
+			fstate = env->cur_state->frame[meta.iter.frameno];
+			if (fstate->stack[meta.iter.spi].slot_type[0] != STACK_ITER) {
+				verbose(env, "expected iterator on stack for release\n");
+				return -EINVAL;
+			}
+
+			iter_st = get_iter_from_state(env->cur_state, &meta);
+			if (!iter_st->id) {
+				verbose(env, "no acquired reference to release\n");
+				return -EINVAL;
+			}
+			err = iter_release_acquired_ref(env, iter_st);
 		} else {
 			err = release_reference(env, reg->ref_obj_id);
 			if (err)
@@ -14291,6 +14339,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			__mark_reg_const_zero(env, &regs[BPF_REG_0]);
 		mark_btf_func_reg_size(env, BPF_REG_0, t->size);
 	} else if (btf_type_is_ptr(t)) {
+		struct bpf_reg_state *iter_acquire_st = NULL;
+
 		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
 		err = check_special_kfunc(env, &meta, regs, insn_aux, ptr_type, desc_btf);
 		if (err) {
@@ -14374,7 +14424,21 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		}
 		mark_btf_func_reg_size(env, BPF_REG_0, sizeof(void *));
 		if (is_kfunc_acquire(&meta)) {
-			int id = acquire_reference(env, insn_idx);
+			int id;
+
+			/*
+			 * For iterators with KF_ACQUIRE, auto-release the previous
+			 * iteration's ref before acquiring a new one, and after
+			 * acquisition track the new ref on the iter slot.
+			 */
+			if (is_iter_next_kfunc(&meta)) {
+				iter_acquire_st = get_iter_from_state(env->cur_state, &meta);
+				err = iter_release_acquired_ref(env, iter_acquire_st);
+				if (err)
+					return err;
+			}
+
+			id = acquire_reference(env, insn_idx);
 
 			if (id < 0)
 				return id;
@@ -14385,6 +14449,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			ref_set_non_owning(env, &regs[BPF_REG_0]);
 		}
 
+		if (iter_acquire_st)
+			iter_acquire_st->id = regs[BPF_REG_0].ref_obj_id;
+
 		if (reg_may_point_to_spin_lock(&regs[BPF_REG_0]) && !regs[BPF_REG_0].id)
 			regs[BPF_REG_0].id = ++env->id_gen;
 	} else if (btf_type_is_void(t)) {
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf v5 2/8] bpf: consolidate sleepable checks in check_helper_call()
  2026-02-26 16:14 [PATCH bpf v5 0/8] Introduce KF_FORBID_FAULT modifier for acquire/release kfuncs Puranjay Mohan
  2026-02-26 16:14 ` [PATCH bpf v5 1/8] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators Puranjay Mohan
@ 2026-02-26 16:14 ` Puranjay Mohan
  2026-02-26 18:36   ` Eduard Zingerman
  2026-02-26 16:14 ` [PATCH bpf v5 3/8] bpf: consolidate sleepable checks in check_kfunc_call() Puranjay Mohan
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Puranjay Mohan @ 2026-02-26 16:14 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	kernel-team, Mykyta Yatsenko

check_helper_call() prints the error message for every
env->cur_state->active* element when calling a sleepable helper.
Consolidate all of them into a single print statement.

The check for env->cur_state->active_locks was not part of the removed
print statements and will not be triggered with the consolidated print
as well because it is checked in do_check() before check_helper_call()
is even reached.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 kernel/bpf/verifier.c                         | 44 +++++++------------
 .../selftests/bpf/prog_tests/summarization.c  |  2 +-
 .../bpf/progs/verifier_async_cb_context.c     |  4 +-
 3 files changed, 20 insertions(+), 30 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c80e96c2c271..1d7f1b5c439a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11593,6 +11593,19 @@ static inline bool in_sleepable_context(struct bpf_verifier_env *env)
 	       in_sleepable(env);
 }
 
+static const char *non_sleepable_context_description(struct bpf_verifier_env *env)
+{
+	if (env->cur_state->active_rcu_locks)
+		return "rcu_read_lock region";
+	if (env->cur_state->active_preempt_locks)
+		return "non-preemptible region";
+	if (env->cur_state->active_irq_id)
+		return "IRQ-disabled region";
+	if (env->cur_state->active_locks)
+		return "lock region";
+	return "non-sleepable prog";
+}
+
 static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			     int *insn_idx_p)
 {
@@ -11632,11 +11645,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		return -EINVAL;
 	}
 
-	if (!in_sleepable(env) && fn->might_sleep) {
-		verbose(env, "helper call might sleep in a non-sleepable prog\n");
-		return -EINVAL;
-	}
-
 	/* With LD_ABS/IND some JITs save/restore skb from r1. */
 	changes_data = bpf_helper_changes_pkt_data(func_id);
 	if (changes_data && fn->arg1_type != ARG_PTR_TO_CTX) {
@@ -11653,28 +11661,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		return err;
 	}
 
-	if (env->cur_state->active_rcu_locks) {
-		if (fn->might_sleep) {
-			verbose(env, "sleepable helper %s#%d in rcu_read_lock region\n",
-				func_id_name(func_id), func_id);
-			return -EINVAL;
-		}
-	}
-
-	if (env->cur_state->active_preempt_locks) {
-		if (fn->might_sleep) {
-			verbose(env, "sleepable helper %s#%d in non-preemptible region\n",
-				func_id_name(func_id), func_id);
-			return -EINVAL;
-		}
-	}
-
-	if (env->cur_state->active_irq_id) {
-		if (fn->might_sleep) {
-			verbose(env, "sleepable helper %s#%d in IRQ-disabled region\n",
-				func_id_name(func_id), func_id);
-			return -EINVAL;
-		}
+	if (fn->might_sleep && !in_sleepable_context(env)) {
+		verbose(env, "sleepable helper %s#%d in %s\n", func_id_name(func_id), func_id,
+			non_sleepable_context_description(env));
+		return -EINVAL;
 	}
 
 	/* Track non-sleepable context for helpers. */
diff --git a/tools/testing/selftests/bpf/prog_tests/summarization.c b/tools/testing/selftests/bpf/prog_tests/summarization.c
index 5dd6c120a838..6951786044ca 100644
--- a/tools/testing/selftests/bpf/prog_tests/summarization.c
+++ b/tools/testing/selftests/bpf/prog_tests/summarization.c
@@ -58,7 +58,7 @@ static void test_aux(const char *main_prog_name,
 	 * this particular combination can be enabled.
 	 */
 	if (!strcmp("might_sleep", replacement) && err) {
-		ASSERT_HAS_SUBSTR(log, "helper call might sleep in a non-sleepable prog", "error log");
+		ASSERT_HAS_SUBSTR(log, "sleepable helper bpf_copy_from_user#", "error log");
 		ASSERT_EQ(err, -EINVAL, "err");
 		test__skip();
 		goto out;
diff --git a/tools/testing/selftests/bpf/progs/verifier_async_cb_context.c b/tools/testing/selftests/bpf/progs/verifier_async_cb_context.c
index 39aff82549c9..6bf95550a024 100644
--- a/tools/testing/selftests/bpf/progs/verifier_async_cb_context.c
+++ b/tools/testing/selftests/bpf/progs/verifier_async_cb_context.c
@@ -31,7 +31,7 @@ static int timer_cb(void *map, int *key, struct bpf_timer *timer)
 }
 
 SEC("fentry/bpf_fentry_test1")
-__failure __msg("helper call might sleep in a non-sleepable prog")
+__failure __msg("sleepable helper bpf_copy_from_user#{{[0-9]+}} in non-sleepable prog")
 int timer_non_sleepable_prog(void *ctx)
 {
 	struct timer_elem *val;
@@ -47,7 +47,7 @@ int timer_non_sleepable_prog(void *ctx)
 }
 
 SEC("lsm.s/file_open")
-__failure __msg("helper call might sleep in a non-sleepable prog")
+__failure __msg("sleepable helper bpf_copy_from_user#{{[0-9]+}} in non-sleepable prog")
 int timer_sleepable_prog(void *ctx)
 {
 	struct timer_elem *val;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf v5 3/8] bpf: consolidate sleepable checks in check_kfunc_call()
  2026-02-26 16:14 [PATCH bpf v5 0/8] Introduce KF_FORBID_FAULT modifier for acquire/release kfuncs Puranjay Mohan
  2026-02-26 16:14 ` [PATCH bpf v5 1/8] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators Puranjay Mohan
  2026-02-26 16:14 ` [PATCH bpf v5 2/8] bpf: consolidate sleepable checks in check_helper_call() Puranjay Mohan
@ 2026-02-26 16:14 ` Puranjay Mohan
  2026-02-26 16:14 ` [PATCH bpf v5 4/8] bpf: consolidate sleepable checks in check_func_call() Puranjay Mohan
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Puranjay Mohan @ 2026-02-26 16:14 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	kernel-team, Mykyta Yatsenko

check_kfunc_call() has multiple scattered checks that reject sleepable
kfuncs in various non-sleepable contexts (RCU, preempt-disabled, IRQ-
disabled). These are the same conditions already checked by
in_sleepable_context(), so replace them with a single consolidated
check.

This also simplifies the preempt lock tracking by flattening the nested
if/else structure into a linear chain: preempt_disable increments,
preempt_enable checks for underflow and decrements. The sleepable check
is kept as a separate block since it is logically distinct from the lock
accounting.

No functional change since in_sleepable_context() checks all the same
state (active_rcu_locks, active_preempt_locks, active_locks,
active_irq_id, in_sleepable).

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 kernel/bpf/verifier.c | 36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1d7f1b5c439a..270d45b345a3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -14192,34 +14192,24 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 				}
 			}));
 		}
-	} else if (sleepable && env->cur_state->active_rcu_locks) {
-		verbose(env, "kernel func %s is sleepable within rcu_read_lock region\n", func_name);
-		return -EACCES;
-	}
-
-	if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
-		verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
-		return -EACCES;
-	}
-
-	if (env->cur_state->active_preempt_locks) {
-		if (preempt_disable) {
-			env->cur_state->active_preempt_locks++;
-		} else if (preempt_enable) {
-			env->cur_state->active_preempt_locks--;
-		} else if (sleepable) {
-			verbose(env, "kernel func %s is sleepable within non-preemptible region\n", func_name);
-			return -EACCES;
-		}
 	} else if (preempt_disable) {
 		env->cur_state->active_preempt_locks++;
 	} else if (preempt_enable) {
-		verbose(env, "unmatched attempt to enable preemption (kernel function %s)\n", func_name);
-		return -EINVAL;
+		if (env->cur_state->active_preempt_locks == 0) {
+			verbose(env, "unmatched attempt to enable preemption (kernel function %s)\n", func_name);
+			return -EINVAL;
+		}
+		env->cur_state->active_preempt_locks--;
 	}
 
-	if (env->cur_state->active_irq_id && sleepable) {
-		verbose(env, "kernel func %s is sleepable within IRQ-disabled region\n", func_name);
+	if (sleepable && !in_sleepable_context(env)) {
+		verbose(env, "kernel func %s is sleepable within %s\n",
+			func_name, non_sleepable_context_description(env));
+		return -EACCES;
+	}
+
+	if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
+		verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
 		return -EACCES;
 	}
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf v5 4/8] bpf: consolidate sleepable checks in check_func_call()
  2026-02-26 16:14 [PATCH bpf v5 0/8] Introduce KF_FORBID_FAULT modifier for acquire/release kfuncs Puranjay Mohan
                   ` (2 preceding siblings ...)
  2026-02-26 16:14 ` [PATCH bpf v5 3/8] bpf: consolidate sleepable checks in check_kfunc_call() Puranjay Mohan
@ 2026-02-26 16:14 ` Puranjay Mohan
  2026-02-26 19:00   ` Eduard Zingerman
  2026-02-26 16:14 ` [PATCH bpf v5 5/8] bpf: Add KF_FORBID_FAULT modifier for KF_ACQUIRE kfuncs Puranjay Mohan
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Puranjay Mohan @ 2026-02-26 16:14 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	kernel-team

The sleepable context check for global function calls in
check_func_call() open-codes the same checks that in_sleepable_context()
already performs. Replace the open-coded check with a call to
in_sleepable_context() and use non_sleepable_context_description() for
the error message, consistent with check_helper_call() and
check_kfunc_call().

Note that in_sleepable_context() also checks active_locks, which
overlaps with the existing active_locks check above it. However, the two
checks serve different purposes: the active_locks check rejects all
global function calls while holding a lock (not just sleepable ones), so
it must remain as a separate guard.

Update the expected error messages in the irq and preempt_lock selftests
to match.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 kernel/bpf/verifier.c                            | 11 +++++------
 tools/testing/selftests/bpf/progs/irq.c          |  4 ++--
 tools/testing/selftests/bpf/progs/preempt_lock.c |  6 +++---
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 270d45b345a3..061f93d0c2c2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -210,6 +210,8 @@ static bool in_rbtree_lock_required_cb(struct bpf_verifier_env *env);
 static int ref_set_non_owning(struct bpf_verifier_env *env,
 			      struct bpf_reg_state *reg);
 static bool is_trusted_reg(const struct bpf_reg_state *reg);
+static inline bool in_sleepable_context(struct bpf_verifier_env *env);
+static const char *non_sleepable_context_description(struct bpf_verifier_env *env);
 
 static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
 {
@@ -10865,12 +10867,9 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			return -EINVAL;
 		}
 
-		if (env->subprog_info[subprog].might_sleep &&
-		    (env->cur_state->active_rcu_locks || env->cur_state->active_preempt_locks ||
-		     env->cur_state->active_irq_id || !in_sleepable(env))) {
-			verbose(env, "global functions that may sleep are not allowed in non-sleepable context,\n"
-				     "i.e., in a RCU/IRQ/preempt-disabled section, or in\n"
-				     "a non-sleepable BPF program context\n");
+		if (env->subprog_info[subprog].might_sleep && !in_sleepable_context(env)) {
+			verbose(env, "sleepable global function %s() called in %s\n",
+				sub_name, non_sleepable_context_description(env));
 			return -EINVAL;
 		}
 
diff --git a/tools/testing/selftests/bpf/progs/irq.c b/tools/testing/selftests/bpf/progs/irq.c
index 74d912b22de9..e11e82d98904 100644
--- a/tools/testing/selftests/bpf/progs/irq.c
+++ b/tools/testing/selftests/bpf/progs/irq.c
@@ -490,7 +490,7 @@ int irq_non_sleepable_global_subprog(void *ctx)
 }
 
 SEC("?syscall")
-__failure __msg("global functions that may sleep are not allowed in non-sleepable context")
+__failure __msg("sleepable global function")
 int irq_sleepable_helper_global_subprog(void *ctx)
 {
 	unsigned long flags;
@@ -502,7 +502,7 @@ int irq_sleepable_helper_global_subprog(void *ctx)
 }
 
 SEC("?syscall")
-__failure __msg("global functions that may sleep are not allowed in non-sleepable context")
+__failure __msg("sleepable global function")
 int irq_sleepable_global_subprog_indirect(void *ctx)
 {
 	unsigned long flags;
diff --git a/tools/testing/selftests/bpf/progs/preempt_lock.c b/tools/testing/selftests/bpf/progs/preempt_lock.c
index 7d04254e61f1..6d5fce7e6ffc 100644
--- a/tools/testing/selftests/bpf/progs/preempt_lock.c
+++ b/tools/testing/selftests/bpf/progs/preempt_lock.c
@@ -177,7 +177,7 @@ global_subprog_calling_sleepable_global(int i)
 }
 
 SEC("?syscall")
-__failure __msg("global functions that may sleep are not allowed in non-sleepable context")
+__failure __msg("sleepable global function")
 int preempt_global_sleepable_helper_subprog(struct __sk_buff *ctx)
 {
 	preempt_disable();
@@ -188,7 +188,7 @@ int preempt_global_sleepable_helper_subprog(struct __sk_buff *ctx)
 }
 
 SEC("?syscall")
-__failure __msg("global functions that may sleep are not allowed in non-sleepable context")
+__failure __msg("sleepable global function")
 int preempt_global_sleepable_kfunc_subprog(struct __sk_buff *ctx)
 {
 	preempt_disable();
@@ -199,7 +199,7 @@ int preempt_global_sleepable_kfunc_subprog(struct __sk_buff *ctx)
 }
 
 SEC("?syscall")
-__failure __msg("global functions that may sleep are not allowed in non-sleepable context")
+__failure __msg("sleepable global function")
 int preempt_global_sleepable_subprog_indirect(struct __sk_buff *ctx)
 {
 	preempt_disable();
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf v5 5/8] bpf: Add KF_FORBID_FAULT modifier for KF_ACQUIRE kfuncs
  2026-02-26 16:14 [PATCH bpf v5 0/8] Introduce KF_FORBID_FAULT modifier for acquire/release kfuncs Puranjay Mohan
                   ` (3 preceding siblings ...)
  2026-02-26 16:14 ` [PATCH bpf v5 4/8] bpf: consolidate sleepable checks in check_func_call() Puranjay Mohan
@ 2026-02-26 16:14 ` Puranjay Mohan
  2026-02-26 16:14 ` [PATCH bpf v5 6/8] bpf: Move locking to bpf_iter_task_vma_next() Puranjay Mohan
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Puranjay Mohan @ 2026-02-26 16:14 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	kernel-team

With KF_ACQUIRE support for iterators in place, we need a way to tell
the verifier that holding a particular acquired reference forbids
faulting. For example, task_vma's _next holds mmap_lock, so any
operation that might trigger a page fault between _next and _release
must be rejected to avoid deadlocking on mmap_lock re-acquisition.

Note that mmap_lock is a sleeping lock (rw_semaphore), so sleeping
itself is fine while holding it. The actual constraint is about
faulting, not sleeping. The flag is named KF_FORBID_FAULT to reflect
this. The current implementation conservatively blocks all sleepable
operations while the reference is held.

Add a KF_FORBID_FAULT flag (1 << 17) that can be combined with
KF_ACQUIRE. When acquire_reference() is called for such a kfunc, the
reference is tagged with forbid_fault=true and a per-state
forbid_fault_count counter is incremented. When the reference is
released through release_reference_state(), the counter is decremented.

The counter is checked wherever the verifier decides if sleeping is
allowed.

This is generic and works for both iterator and non-iterator kfuncs.
For iterators, the auto-release and explicit _release from the previous
commit handle the counter decrement automatically via
release_reference().

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 include/linux/bpf_verifier.h |  2 ++
 include/linux/btf.h          |  1 +
 kernel/bpf/verifier.c        | 36 +++++++++++++++++++++++++++++-------
 3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index ef8e45a362d9..11102e1b53ef 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -261,6 +261,7 @@ struct bpf_reference_state {
 	 * it matches on unlock.
 	 */
 	void *ptr;
+	bool forbid_fault;	/* ref prevents faulting while held */
 };
 
 struct bpf_retval_range {
@@ -421,6 +422,7 @@ struct bpf_verifier_state {
 	u32 active_lock_id;
 	void *active_lock_ptr;
 	u32 active_rcu_locks;
+	u32 forbid_fault_count;
 
 	bool speculative;
 	bool in_sleepable;
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 48108471c5b1..fec8298692a2 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -79,6 +79,7 @@
 #define KF_ARENA_ARG1   (1 << 14) /* kfunc takes an arena pointer as its first argument */
 #define KF_ARENA_ARG2   (1 << 15) /* kfunc takes an arena pointer as its second argument */
 #define KF_IMPLICIT_ARGS (1 << 16) /* kfunc has implicit arguments supplied by the verifier */
+#define KF_FORBID_FAULT  (1 << 17) /* acquired reference forbids faulting while held */
 
 /*
  * Tag marking a kernel function as a kfunc. This is meant to minimize the
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 061f93d0c2c2..cb69761332ab 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -202,7 +202,7 @@ struct bpf_verifier_stack_elem {
 
 #define BPF_PRIV_STACK_MIN_SIZE		64
 
-static int acquire_reference(struct bpf_verifier_env *env, int insn_idx);
+static int acquire_reference(struct bpf_verifier_env *env, int insn_idx, bool forbid_fault);
 static int release_reference_nomark(struct bpf_verifier_state *state, int ref_obj_id);
 static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
 static void invalidate_non_owning_refs(struct bpf_verifier_env *env);
@@ -807,7 +807,7 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
 		if (clone_ref_obj_id)
 			id = clone_ref_obj_id;
 		else
-			id = acquire_reference(env, insn_idx);
+			id = acquire_reference(env, insn_idx, false);
 
 		if (id < 0)
 			return id;
@@ -1055,7 +1055,7 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
 	if (spi < 0)
 		return spi;
 
-	id = acquire_reference(env, insn_idx);
+	id = acquire_reference(env, insn_idx, false);
 	if (id < 0)
 		return id;
 
@@ -1476,6 +1476,7 @@ static int copy_reference_state(struct bpf_verifier_state *dst, const struct bpf
 	dst->active_irq_id = src->active_irq_id;
 	dst->active_lock_id = src->active_lock_id;
 	dst->active_lock_ptr = src->active_lock_ptr;
+	dst->forbid_fault_count = src->forbid_fault_count;
 	return 0;
 }
 
@@ -1549,7 +1550,7 @@ static struct bpf_reference_state *acquire_reference_state(struct bpf_verifier_e
 	return &state->refs[new_ofs];
 }
 
-static int acquire_reference(struct bpf_verifier_env *env, int insn_idx)
+static int acquire_reference(struct bpf_verifier_env *env, int insn_idx, bool forbid_fault)
 {
 	struct bpf_reference_state *s;
 
@@ -1558,6 +1559,9 @@ static int acquire_reference(struct bpf_verifier_env *env, int insn_idx)
 		return -ENOMEM;
 	s->type = REF_TYPE_PTR;
 	s->id = ++env->id_gen;
+	s->forbid_fault = forbid_fault;
+	if (forbid_fault)
+		env->cur_state->forbid_fault_count++;
 	return s->id;
 }
 
@@ -1600,6 +1604,9 @@ static void release_reference_state(struct bpf_verifier_state *state, int idx)
 	int last_idx;
 	size_t rem;
 
+	if (state->refs[idx].forbid_fault)
+		state->forbid_fault_count--;
+
 	/* IRQ state requires the relative ordering of elements remaining the
 	 * same, since it relies on the refs array to behave as a stack, so that
 	 * it can detect out-of-order IRQ restore. Hence use memmove to shift
@@ -11589,6 +11596,7 @@ static inline bool in_sleepable_context(struct bpf_verifier_env *env)
 	       !env->cur_state->active_preempt_locks &&
 	       !env->cur_state->active_locks &&
 	       !env->cur_state->active_irq_id &&
+	       !env->cur_state->forbid_fault_count &&
 	       in_sleepable(env);
 }
 
@@ -11602,6 +11610,8 @@ static const char *non_sleepable_context_description(struct bpf_verifier_env *en
 		return "IRQ-disabled region";
 	if (env->cur_state->active_locks)
 		return "lock region";
+	if (env->cur_state->forbid_fault_count)
+		return "nofault region";
 	return "non-sleepable prog";
 }
 
@@ -12047,7 +12057,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		/* For release_reference() */
 		regs[BPF_REG_0].ref_obj_id = meta.ref_obj_id;
 	} else if (is_acquire_function(func_id, meta.map.ptr)) {
-		int id = acquire_reference(env, insn_idx);
+		int id = acquire_reference(env, insn_idx, false);
 
 		if (id < 0)
 			return id;
@@ -12152,6 +12162,11 @@ static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta)
 	return meta->kfunc_flags & KF_RELEASE;
 }
 
+static bool is_kfunc_forbid_fault(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_FORBID_FAULT;
+}
+
 static bool is_kfunc_sleepable(struct bpf_kfunc_call_arg_meta *meta)
 {
 	return meta->kfunc_flags & KF_SLEEPABLE;
@@ -14403,6 +14418,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		}
 		mark_btf_func_reg_size(env, BPF_REG_0, sizeof(void *));
 		if (is_kfunc_acquire(&meta)) {
+			bool forbid_fault = is_kfunc_forbid_fault(&meta);
 			int id;
 
 			/*
@@ -14417,7 +14433,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 					return err;
 			}
 
-			id = acquire_reference(env, insn_idx);
+			id = acquire_reference(env, insn_idx, forbid_fault);
 
 			if (id < 0)
 				return id;
@@ -20091,6 +20107,9 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
 	if (old->active_rcu_locks != cur->active_rcu_locks)
 		return false;
 
+	if (old->forbid_fault_count != cur->forbid_fault_count)
+		return false;
+
 	if (!check_ids(old->active_irq_id, cur->active_irq_id, idmap))
 		return false;
 
@@ -20104,6 +20123,9 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
 			return false;
 		switch (old->refs[i].type) {
 		case REF_TYPE_PTR:
+			if (old->refs[i].forbid_fault != cur->refs[i].forbid_fault)
+				return false;
+			break;
 		case REF_TYPE_IRQ:
 			break;
 		case REF_TYPE_LOCK:
@@ -24629,7 +24651,7 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog)
 	if (!subprog && env->prog->type == BPF_PROG_TYPE_STRUCT_OPS) {
 		for (i = 0; i < aux->ctx_arg_info_size; i++)
 			aux->ctx_arg_info[i].ref_obj_id = aux->ctx_arg_info[i].refcounted ?
-							  acquire_reference(env, 0) : 0;
+							  acquire_reference(env, 0, false) : 0;
 	}
 
 	ret = do_check(env);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf v5 6/8] bpf: Move locking to bpf_iter_task_vma_next()
  2026-02-26 16:14 [PATCH bpf v5 0/8] Introduce KF_FORBID_FAULT modifier for acquire/release kfuncs Puranjay Mohan
                   ` (4 preceding siblings ...)
  2026-02-26 16:14 ` [PATCH bpf v5 5/8] bpf: Add KF_FORBID_FAULT modifier for KF_ACQUIRE kfuncs Puranjay Mohan
@ 2026-02-26 16:14 ` Puranjay Mohan
  2026-02-26 16:14 ` [PATCH bpf v5 7/8] bpf: Add split iteration support to task_vma iterator Puranjay Mohan
  2026-02-26 16:14 ` [PATCH bpf v5 8/8] selftests/bpf: Add tests for split " Puranjay Mohan
  7 siblings, 0 replies; 12+ messages in thread
From: Puranjay Mohan @ 2026-02-26 16:14 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	kernel-team

The current implementation of task_vma iterator takes the mmap_lock in
the _new() function and holds it for the entire duration of the
iterator. The next commits will allow releasing the lock in the middle
of the iteration and it would mean that the _next() call should re-take
the mmap_lock.

Move the mmap_lock setup to bpf_iter_task_vma_next()

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 kernel/bpf/task_iter.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 98d9b4c0daff..a85115c191e4 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -799,6 +799,8 @@ struct bpf_iter_task_vma_kern_data {
 	struct mm_struct *mm;
 	struct mmap_unlock_irq_work *work;
 	struct vma_iterator vmi;
+	u64 last_addr;
+	bool locked;
 };
 
 struct bpf_iter_task_vma {
@@ -819,7 +821,6 @@ __bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
 				      struct task_struct *task, u64 addr)
 {
 	struct bpf_iter_task_vma_kern *kit = (void *)it;
-	bool irq_work_busy = false;
 	int err;
 
 	BUILD_BUG_ON(sizeof(struct bpf_iter_task_vma_kern) != sizeof(struct bpf_iter_task_vma));
@@ -840,14 +841,8 @@ __bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
 		goto err_cleanup_iter;
 	}
 
-	/* kit->data->work == NULL is valid after bpf_mmap_unlock_get_irq_work */
-	irq_work_busy = bpf_mmap_unlock_get_irq_work(&kit->data->work);
-	if (irq_work_busy || !mmap_read_trylock(kit->data->mm)) {
-		err = -EBUSY;
-		goto err_cleanup_iter;
-	}
-
-	vma_iter_init(&kit->data->vmi, kit->data->mm, addr);
+	kit->data->locked = false;
+	kit->data->last_addr = addr;
 	return 0;
 
 err_cleanup_iter:
@@ -862,10 +857,26 @@ __bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
 __bpf_kfunc struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_vma *it)
 {
 	struct bpf_iter_task_vma_kern *kit = (void *)it;
+	struct vm_area_struct *vma;
 
 	if (!kit->data) /* bpf_iter_task_vma_new failed */
 		return NULL;
-	return vma_next(&kit->data->vmi);
+
+	if (!kit->data->locked) {
+		bool irq_work_busy;
+
+		irq_work_busy = bpf_mmap_unlock_get_irq_work(&kit->data->work);
+		if (irq_work_busy || !mmap_read_trylock(kit->data->mm))
+			return NULL;
+
+		kit->data->locked = true;
+		vma_iter_init(&kit->data->vmi, kit->data->mm, kit->data->last_addr);
+	}
+
+	vma = vma_next(&kit->data->vmi);
+	if (vma)
+		kit->data->last_addr = vma->vm_end;
+	return vma;
 }
 
 __bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it)
@@ -873,7 +884,8 @@ __bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it)
 	struct bpf_iter_task_vma_kern *kit = (void *)it;
 
 	if (kit->data) {
-		bpf_mmap_unlock_mm(kit->data->work, kit->data->mm);
+		if (kit->data->locked)
+			bpf_mmap_unlock_mm(kit->data->work, kit->data->mm);
 		put_task_struct(kit->data->task);
 		bpf_mem_free(&bpf_global_ma, kit->data);
 	}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf v5 7/8] bpf: Add split iteration support to task_vma iterator
  2026-02-26 16:14 [PATCH bpf v5 0/8] Introduce KF_FORBID_FAULT modifier for acquire/release kfuncs Puranjay Mohan
                   ` (5 preceding siblings ...)
  2026-02-26 16:14 ` [PATCH bpf v5 6/8] bpf: Move locking to bpf_iter_task_vma_next() Puranjay Mohan
@ 2026-02-26 16:14 ` Puranjay Mohan
  2026-02-26 16:14 ` [PATCH bpf v5 8/8] selftests/bpf: Add tests for split " Puranjay Mohan
  7 siblings, 0 replies; 12+ messages in thread
From: Puranjay Mohan @ 2026-02-26 16:14 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	kernel-team

Wire up the task_vma iterator to use the KF_ACQUIRE/KF_RELEASE and
KF_FORBID_FAULT infrastructure from the previous commits.

Annotate bpf_iter_task_vma_next with KF_ACQUIRE | KF_FORBID_FAULT so
each returned VMA pointer becomes a tracked reference that prevents
faulting. Add bpf_iter_task_vma_release(it__iter) with KF_RELEASE that
releases the mmap read lock, drops the acquired reference, and
re-enables sleeping. The next call to _next will re-acquire the lock via
mmap_read_trylock().

This enables a split iteration pattern:

  bpf_for_each(task_vma, vma, task, 0) {
      u64 start = vma->vm_start;
      /* vma access allowed, sleeping forbidden */
      bpf_iter_task_vma_release(&___it);
      /* vma access forbidden, sleeping allowed */
      bpf_copy_from_user(&buf, sizeof(buf), (void *)start);
  }

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 kernel/bpf/helpers.c   |  3 ++-
 kernel/bpf/task_iter.c | 10 ++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 6eb6c82ed2ee..e4d0a3a62428 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -4605,8 +4605,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
 BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_iter_task_vma_new, KF_ITER_NEW | KF_RCU)
-BTF_ID_FLAGS(func, bpf_iter_task_vma_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_task_vma_next, KF_ITER_NEXT | KF_RET_NULL | KF_ACQUIRE | KF_FORBID_FAULT)
 BTF_ID_FLAGS(func, bpf_iter_task_vma_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_task_vma_release, KF_RELEASE)
 #ifdef CONFIG_CGROUPS
 BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
 BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index a85115c191e4..55b264d634db 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -879,6 +879,16 @@ __bpf_kfunc struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_v
 	return vma;
 }
 
+__bpf_kfunc void bpf_iter_task_vma_release(struct bpf_iter_task_vma *it__iter)
+{
+	struct bpf_iter_task_vma_kern *kit = (void *)it__iter;
+
+	if (kit->data && kit->data->locked) {
+		bpf_mmap_unlock_mm(kit->data->work, kit->data->mm);
+		kit->data->locked = false;
+	}
+}
+
 __bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it)
 {
 	struct bpf_iter_task_vma_kern *kit = (void *)it;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf v5 8/8] selftests/bpf: Add tests for split task_vma iterator
  2026-02-26 16:14 [PATCH bpf v5 0/8] Introduce KF_FORBID_FAULT modifier for acquire/release kfuncs Puranjay Mohan
                   ` (6 preceding siblings ...)
  2026-02-26 16:14 ` [PATCH bpf v5 7/8] bpf: Add split iteration support to task_vma iterator Puranjay Mohan
@ 2026-02-26 16:14 ` Puranjay Mohan
  7 siblings, 0 replies; 12+ messages in thread
From: Puranjay Mohan @ 2026-02-26 16:14 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
	kernel-team

Add runtime and verifier tests for the split task_vma iteration model.

The runtime tests are:
- iter_task_vma_release_and_copy: iterates VMAs, saves vm_start,
  releases mmap_lock via bpf_iter_task_vma_release(), then calls
  bpf_copy_from_user() on the saved address.
- iter_task_vma_nested: two nested task_vma iterators on the same task
  verify that mmap_read_trylock() succeeds for the second reader on
  the same mm.

The verifier tests cover:
- nosleep_iter_release_then_sleep: release then sleep is accepted
- nosleep_iter_sleep_without_release: sleep without release is rejected
  with "in nofault region"
- nosleep_iter_vma_access_after_release: VMA access after release is
  rejected with "invalid mem access 'scalar'"
- nosleep_iter_double_release: double release is rejected
- nosleep_iter_release_without_acquire: release without prior _next is
  rejected
- nosleep_iter_nested_release_inner: releasing inner iterator does not
  allow sleeping when outer still holds a nofault reference

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 .../testing/selftests/bpf/bpf_experimental.h  |   1 +
 .../testing/selftests/bpf/prog_tests/iters.c  |  24 ++++
 .../selftests/bpf/progs/iters_task_vma.c      |  71 ++++++++++
 .../bpf/progs/iters_task_vma_nosleep.c        | 125 ++++++++++++++++++
 4 files changed, 221 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task_vma_nosleep.c

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 4b7210c318dd..c845553bb932 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -165,6 +165,7 @@ extern int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
 				 struct task_struct *task,
 				 __u64 addr) __ksym;
 extern struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_vma *it) __ksym;
+extern void bpf_iter_task_vma_release(struct bpf_iter_task_vma *it) __ksym;
 extern void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it) __ksym;
 
 /* Convenience macro to wrap over bpf_obj_drop_impl */
diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
index a539980a2fbe..ab24871762c8 100644
--- a/tools/testing/selftests/bpf/prog_tests/iters.c
+++ b/tools/testing/selftests/bpf/prog_tests/iters.c
@@ -21,6 +21,7 @@
 #include "iters_css_task.skel.h"
 #include "iters_css.skel.h"
 #include "iters_task_failure.skel.h"
+#include "iters_task_vma_nosleep.skel.h"
 
 static void subtest_num_iters(void)
 {
@@ -152,6 +153,28 @@ static void subtest_task_vma_iters(void)
 	if (!ASSERT_EQ(skel->bss->vmas_seen, seen, "vmas_seen_eq"))
 		goto cleanup;
 
+	/* Test release+sleepable: trigger the release_and_copy program */
+	skel->bss->release_vmas_seen = 0;
+	err = iters_task_vma__attach(skel);
+	if (!ASSERT_OK(err, "skel_reattach"))
+		goto cleanup;
+
+	getpgid(skel->bss->target_pid);
+	iters_task_vma__detach(skel);
+
+	ASSERT_GT(skel->bss->release_vmas_seen, 0, "release_vmas_seen_gt_zero");
+
+	/* Test nested iterators on same task (same mmap_lock) */
+	skel->bss->nested_vmas_seen = 0;
+	err = iters_task_vma__attach(skel);
+	if (!ASSERT_OK(err, "skel_reattach_nested"))
+		goto cleanup;
+
+	getpgid(skel->bss->target_pid);
+	iters_task_vma__detach(skel);
+
+	ASSERT_GT(skel->bss->nested_vmas_seen, 0, "nested_vmas_seen_gt_zero");
+
 cleanup:
 	if (f)
 		fclose(f);
@@ -322,4 +345,5 @@ void test_iters(void)
 	if (test__start_subtest("css"))
 		subtest_css_iters();
 	RUN_TESTS(iters_task_failure);
+	RUN_TESTS(iters_task_vma_nosleep);
 }
diff --git a/tools/testing/selftests/bpf/progs/iters_task_vma.c b/tools/testing/selftests/bpf/progs/iters_task_vma.c
index dc0c3691dcc2..baecd79d2998 100644
--- a/tools/testing/selftests/bpf/progs/iters_task_vma.c
+++ b/tools/testing/selftests/bpf/progs/iters_task_vma.c
@@ -40,4 +40,75 @@ int iter_task_vma_for_each(const void *ctx)
 	return 0;
 }
 
+unsigned int release_vmas_seen = 0;
+
+SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
+int iter_task_vma_release_and_copy(const void *ctx)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct vm_area_struct *vma;
+	unsigned int seen = 0;
+
+	if (task->pid != target_pid)
+		return 0;
+
+	if (release_vmas_seen)
+		return 0;
+
+	bpf_for_each(task_vma, vma, task, 0) {
+		__u64 start;
+		char buf[8];
+
+		if (bpf_cmp_unlikely(seen, >=, 1000))
+			break;
+
+		/* Phase 1: mmap_lock held, read VMA data */
+		start = vma->vm_start;
+
+		/* Transition: release mmap_lock */
+		bpf_iter_task_vma_release(&___it);
+		/* VMA pointer is now invalid; sleepable helpers allowed */
+
+		/* Phase 2: mmap_lock released, sleepable call */
+		bpf_copy_from_user(&buf, sizeof(buf), (void *)start);
+
+		seen++;
+	}
+
+	release_vmas_seen = seen;
+	return 0;
+}
+
+/*
+ * Test nested task_vma iterators on the same task.  Both iterators take
+ * mmap_read_trylock() on the same mm; the rwsem should allow the second
+ * reader and the inner loop should observe at least one VMA.
+ */
+unsigned int nested_vmas_seen = 0;
+
+SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
+int iter_task_vma_nested(const void *ctx)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct vm_area_struct *vma1, *vma2;
+	unsigned int seen = 0;
+
+	if (task->pid != target_pid)
+		return 0;
+
+	if (nested_vmas_seen)
+		return 0;
+
+	bpf_for_each(task_vma, vma1, task, 0) {
+		bpf_for_each(task_vma, vma2, task, 0) {
+			seen++;
+			break;
+		}
+		break;
+	}
+
+	nested_vmas_seen = seen;
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/iters_task_vma_nosleep.c b/tools/testing/selftests/bpf/progs/iters_task_vma_nosleep.c
new file mode 100644
index 000000000000..fc02702b233c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_task_vma_nosleep.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include "bpf_experimental.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+/* Negative test: sleepable call without release should be rejected */
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("sleepable helper bpf_copy_from_user#{{[0-9]+}} in nofault region")
+int nosleep_iter_sleep_without_release(const void *ctx)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct vm_area_struct *vma;
+
+	bpf_for_each(task_vma, vma, task, 0) {
+		char buf[8];
+
+		/* Attempt to call sleepable helper without releasing mmap_lock.
+		 * Verifier should reject this.
+		 */
+		bpf_copy_from_user(&buf, sizeof(buf), (void *)vma->vm_start);
+		break;
+	}
+	return 0;
+}
+
+/* Negative test: VMA access after release should be rejected */
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("invalid mem access 'scalar'")
+int nosleep_iter_vma_access_after_release(const void *ctx)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct vm_area_struct *vma;
+	__u64 val = 0;
+
+	bpf_for_each(task_vma, vma, task, 0) {
+		bpf_iter_task_vma_release(&___it);
+		/* VMA pointer is now invalid. Accessing it should be rejected. */
+		val = vma->vm_start;
+		break;
+	}
+	__sink(val);
+	return 0;
+}
+
+/* Positive test: release then sleepable call should succeed */
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__success
+int nosleep_iter_release_then_sleep(const void *ctx)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct vm_area_struct *vma;
+
+	bpf_for_each(task_vma, vma, task, 0) {
+		__u64 start = vma->vm_start;
+		char buf[8];
+
+		bpf_iter_task_vma_release(&___it);
+		bpf_copy_from_user(&buf, sizeof(buf), (void *)start);
+		break;
+	}
+	return 0;
+}
+
+/* Negative test: double release should be rejected */
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("no acquired reference to release")
+int nosleep_iter_double_release(const void *ctx)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct vm_area_struct *vma;
+
+	bpf_for_each(task_vma, vma, task, 0) {
+		bpf_iter_task_vma_release(&___it);
+		bpf_iter_task_vma_release(&___it);
+		break;
+	}
+	return 0;
+}
+
+/* Negative test: release without any prior _next acquire */
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("no acquired reference to release")
+int nosleep_iter_release_without_acquire(const void *ctx)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct bpf_iter_task_vma it;
+
+	bpf_iter_task_vma_new(&it, task, 0);
+	/* No _next called, so no acquired reference exists */
+	bpf_iter_task_vma_release(&it);
+	bpf_iter_task_vma_destroy(&it);
+	return 0;
+}
+
+/* Negative test: nested iterators, releasing inner should not allow sleeping
+ * because the outer iterator still holds a nofault reference.
+ */
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("sleepable helper bpf_copy_from_user#{{[0-9]+}} in nofault region")
+int nosleep_iter_nested_release_inner(const void *ctx)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct vm_area_struct *vma_outer, *vma_inner;
+
+	bpf_for_each(task_vma, vma_outer, task, 0) {
+		bpf_for_each(task_vma, vma_inner, task, 0) {
+			__u64 start = vma_inner->vm_start;
+			char buf[8];
+
+			bpf_iter_task_vma_release(&___it);
+			/* Inner released, but outer still holds nofault ref.
+			 * Sleeping should still be forbidden.
+			 */
+			bpf_copy_from_user(&buf, sizeof(buf), (void *)start);
+			break;
+		}
+		break;
+	}
+	return 0;
+}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf v5 2/8] bpf: consolidate sleepable checks in check_helper_call()
  2026-02-26 16:14 ` [PATCH bpf v5 2/8] bpf: consolidate sleepable checks in check_helper_call() Puranjay Mohan
@ 2026-02-26 18:36   ` Eduard Zingerman
  0 siblings, 0 replies; 12+ messages in thread
From: Eduard Zingerman @ 2026-02-26 18:36 UTC (permalink / raw)
  To: Puranjay Mohan, bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Mykyta Yatsenko, kernel-team, Mykyta Yatsenko

On Thu, 2026-02-26 at 08:14 -0800, Puranjay Mohan wrote:
> check_helper_call() prints the error message for every
> env->cur_state->active* element when calling a sleepable helper.
> Consolidate all of them into a single print statement.
> 
> The check for env->cur_state->active_locks was not part of the removed
> print statements and will not be triggered with the consolidated print
> as well because it is checked in do_check() before check_helper_call()
> is even reached.
> 
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> Acked-by: Mykyta Yatsenko <yatsenko@meta.com>

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf v5 4/8] bpf: consolidate sleepable checks in check_func_call()
  2026-02-26 16:14 ` [PATCH bpf v5 4/8] bpf: consolidate sleepable checks in check_func_call() Puranjay Mohan
@ 2026-02-26 19:00   ` Eduard Zingerman
  0 siblings, 0 replies; 12+ messages in thread
From: Eduard Zingerman @ 2026-02-26 19:00 UTC (permalink / raw)
  To: Puranjay Mohan, bpf
  Cc: Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Mykyta Yatsenko, kernel-team

On Thu, 2026-02-26 at 08:14 -0800, Puranjay Mohan wrote:
> The sleepable context check for global function calls in
> check_func_call() open-codes the same checks that in_sleepable_context()
> already performs. Replace the open-coded check with a call to
> in_sleepable_context() and use non_sleepable_context_description() for
> the error message, consistent with check_helper_call() and
> check_kfunc_call().
> 
> Note that in_sleepable_context() also checks active_locks, which
> overlaps with the existing active_locks check above it. However, the two
> checks serve different purposes: the active_locks check rejects all
> global function calls while holding a lock (not just sleepable ones), so
> it must remain as a separate guard.
> 
> Update the expected error messages in the irq and preempt_lock selftests
> to match.
> 
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf v5 1/8] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators
  2026-02-26 16:14 ` [PATCH bpf v5 1/8] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators Puranjay Mohan
@ 2026-02-27  0:46   ` Alexei Starovoitov
  0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2026-02-27  0:46 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: bpf, Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Kernel Team

On Thu, Feb 26, 2026 at 8:15 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>
> Some iterators hold resources (like mmap_lock in task_vma) that prevent
> sleeping. To allow BPF programs to release such resources mid-iteration
> and call sleepable helpers, the verifier needs to track acquire/release
> semantics on iterator _next pointers.
>
> Repurpose the st->id field on STACK_ITER slots to track the ref_obj_id
> of the pointer returned by _next when the kfunc is annotated with
> KF_ACQUIRE. This is safe because st->id is initialized to 0 by
> __mark_reg_known_zero() in mark_stack_slots_iter() and is not compared
> in stacksafe() for STACK_ITER slots.
>
> The lifecycle is:
>
>   _next (KF_ACQUIRE):
>     - auto-release old ref if st->id != 0
>     - acquire new ref, store ref_obj_id in st->id
>     - DRAINED branch: release via st->id, set st->id = 0
>     - ACTIVE branch: keeps ref, st->id tracks it
>
>   _release (KF_RELEASE + __iter arg):
>     - read st->id, release_reference(), set st->id = 0
>
>   _destroy:
>     - release st->id if non-zero before releasing iterator's own ref
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
>  kernel/bpf/verifier.c | 71 +++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 69 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index bb12ba020649..c80e96c2c271 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1038,6 +1038,8 @@ static void __mark_reg_known_zero(struct bpf_reg_state *reg);
>  static bool in_rcu_cs(struct bpf_verifier_env *env);
>
>  static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta);
> +static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta);
> +static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta);
>
>  static int mark_stack_slots_iter(struct bpf_verifier_env *env,
>                                  struct bpf_kfunc_call_arg_meta *meta,
> @@ -1083,6 +1085,22 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
>         return 0;
>  }
>
> +/*
> + * Release the acquired reference tracked by iter_st->id, if any.
> + * Used during auto-release in _next, DRAINED handling, and _destroy.
> + */
> +static int iter_release_acquired_ref(struct bpf_verifier_env *env,
> +                                    struct bpf_reg_state *iter_st)
> +{
> +       int err;
> +
> +       if (!iter_st->id)
> +               return 0;
> +       err = release_reference(env, iter_st->id);
> +       iter_st->id = 0;
> +       return err;
> +}
> +
>  static int unmark_stack_slots_iter(struct bpf_verifier_env *env,
>                                    struct bpf_reg_state *reg, int nr_slots)
>  {
> @@ -1097,8 +1115,14 @@ static int unmark_stack_slots_iter(struct bpf_verifier_env *env,
>                 struct bpf_stack_state *slot = &state->stack[spi - i];
>                 struct bpf_reg_state *st = &slot->spilled_ptr;
>
> -               if (i == 0)
> +               if (i == 0) {
> +                       /*
> +                        * Release any outstanding acquired ref tracked by st->id
> +                        * before releasing the iterator's own ref.
> +                        */
> +                       WARN_ON_ONCE(iter_release_acquired_ref(env, st));
>                         WARN_ON_ONCE(release_reference(env, st->ref_obj_id));
> +               }
>
>                 __mark_reg_not_init(env, st);
>
> @@ -8963,6 +8987,8 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id
>                 /* remember meta->iter info for process_iter_next_call() */
>                 meta->iter.spi = spi;
>                 meta->iter.frameno = reg->frameno;
> +               if (is_kfunc_release(meta))
> +                       meta->release_regno = regno;
>                 meta->ref_obj_id = iter_ref_obj_id(env, reg, spi);
>
>                 if (is_iter_destroy_kfunc(meta)) {
> @@ -9198,6 +9224,12 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
>         /* mark current iter state as drained and assume returned NULL */
>         cur_iter->iter.state = BPF_ITER_STATE_DRAINED;
>         __mark_reg_const_zero(env, &cur_fr->regs[BPF_REG_0]);
> +       /*
> +        * If _next acquired a ref (KF_ACQUIRE), release it in the DRAINED branch since NULL
> +        * was returned.
> +        */
> +       if (is_kfunc_acquire(meta))
> +               return iter_release_acquired_ref(env, cur_iter);
>
>         return 0;
>  }
> @@ -14214,6 +14246,22 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>
>                 if (meta.initialized_dynptr.ref_obj_id) {
>                         err = unmark_stack_slots_dynptr(env, reg);
> +               } else if (base_type(reg->type) == PTR_TO_STACK) {
> +                       struct bpf_func_state *fstate;
> +                       struct bpf_reg_state *iter_st;
> +
> +                       fstate = env->cur_state->frame[meta.iter.frameno];
> +                       if (fstate->stack[meta.iter.spi].slot_type[0] != STACK_ITER) {
> +                               verbose(env, "expected iterator on stack for release\n");
> +                               return -EINVAL;
> +                       }
> +
> +                       iter_st = get_iter_from_state(env->cur_state, &meta);
> +                       if (!iter_st->id) {
> +                               verbose(env, "no acquired reference to release\n");
> +                               return -EINVAL;
> +                       }
> +                       err = iter_release_acquired_ref(env, iter_st);

This hunk looks generic since it acts on PTR_TO_STACK,
but it's extremely specific for one kfunc that is KF_RELEASE,
and in this patchset it happens to be bpf_iter_task_vma_release(),
but any future kfunc that accepts ptr_to_stack will fall into
this case and we will have a problem.
ptr_to_stack doesn't have to be an iterator and it doesn't have
to be this new special iter_st->id.

This whole thing screams "hack" to me.

I also don't like this save of 'id' from R0 and
auto-magic release of that 'id' during next _next() call.
Too much this kfunc and this iter specific things built into
very subtle verifier logic.

Also Amery is working on cleaning up id/ref_obj_id across reg types.
Currently pointers have id where id == 0 is a valid id for packet pointers
and id > 1 gets cleared to zero for all.
Then scalars have id where id == 0 is invalid.
Then dynptr have id.
And now iterators will have id that means something completely different?
It's saved acquired id returned from _next(). sigh.

KF_FORBID_SLEEP is ok-ish. I don't like it, but whatever.
trylock on every _next() is quite problematic.
And this repurpose of 'id' just breaks the camel's back.
Too many custom things across the verifier, kfuncs, iterators.

Sorry. This is no go.
We have to go back to the drawing board with the whole thing.

pw-bot: cr

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-02-27  0:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26 16:14 [PATCH bpf v5 0/8] Introduce KF_FORBID_FAULT modifier for acquire/release kfuncs Puranjay Mohan
2026-02-26 16:14 ` [PATCH bpf v5 1/8] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators Puranjay Mohan
2026-02-27  0:46   ` Alexei Starovoitov
2026-02-26 16:14 ` [PATCH bpf v5 2/8] bpf: consolidate sleepable checks in check_helper_call() Puranjay Mohan
2026-02-26 18:36   ` Eduard Zingerman
2026-02-26 16:14 ` [PATCH bpf v5 3/8] bpf: consolidate sleepable checks in check_kfunc_call() Puranjay Mohan
2026-02-26 16:14 ` [PATCH bpf v5 4/8] bpf: consolidate sleepable checks in check_func_call() Puranjay Mohan
2026-02-26 19:00   ` Eduard Zingerman
2026-02-26 16:14 ` [PATCH bpf v5 5/8] bpf: Add KF_FORBID_FAULT modifier for KF_ACQUIRE kfuncs Puranjay Mohan
2026-02-26 16:14 ` [PATCH bpf v5 6/8] bpf: Move locking to bpf_iter_task_vma_next() Puranjay Mohan
2026-02-26 16:14 ` [PATCH bpf v5 7/8] bpf: Add split iteration support to task_vma iterator Puranjay Mohan
2026-02-26 16:14 ` [PATCH bpf v5 8/8] selftests/bpf: Add tests for split " Puranjay Mohan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox