* [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs
@ 2026-02-23 17:46 Puranjay Mohan
2026-02-23 17:46 ` [PATCH bpf-next v3 1/6] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators Puranjay Mohan
` (6 more replies)
0 siblings, 7 replies; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-23 17:46 UTC (permalink / raw)
To: bpf
Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
kernel-team
Changelog:
v2: https://lore.kernel.org/all/20260223160300.2109907-1-puranjay@kernel.org/
Changes in v2 -> v3:
- Rebased on bpf-next/master
v1: https://lore.kernel.org/all/20260218182555.1501495-1-puranjay@kernel.org/
Changes in v1 -> v2:
- Add a patch to consolidate sleepable context error message printing in
check_helper_call(), has no functional changes (Eduard)
- In check_kfunc_call() for KF_RELEASE and __iter arg, use release_regno
like dynptr rather than custom handling (Mykyta)
- Fix some comments to follow correct style (Mykyta)
- Move state->forbid_sleep_count-- to release_reference_state() (Eduard)
- Remove error message in check_resource_leak() for forbid_sleep_count
because it is redundant and will never trigger (Eduard)
- Consolidated some checks and prints (Eduard)
Some BPF kfuncs acquire resources that prevent sleeping - a lock, a
reference to an object under a lock, a preempt-disable section. Today there
is no way for the verifier to know that holding a particular acquired
reference means sleeping is forbidden. Programs either run entirely in
sleepable or non-sleepable context, with no way to express "sleeping is
forbidden right now, but will be allowed once I release this reference."
This series adds KF_FORBID_SLEEP, a new kfunc flag that can be combined
with KF_ACQUIRE. When a kfunc annotated with KF_ACQUIRE | KF_FORBID_SLEEP
is called, the verifier tags the acquired reference with forbid_sleep and
increments a per-state forbid_sleep_count counter. When the reference is
released (through corresponding KF_RELEASE kfunc), the counter is
decremented. The verifier checks this counter everywhere it decides
whether sleeping is allowed — sleepable helpers, sleepable kfuncs, global
function calls, and resource leak checks.
This is fully generic. Any pair of KF_ACQUIRE/KF_RELEASE kfuncs can opt
into sleep prohibition by adding KF_FORBID_SLEEP to the acquire side.
To make this useful for iterators specifically, the series first extends
the verifier's iterator support to allow KF_ACQUIRE on _next and KF_RELEASE
on a separate kfunc taking an __iter argument. The verifier tracks the
acquired reference on the iterator's stack slot (st->id) and auto-releases
it on the next _next call and on the DRAINED (NULL) path, so the
acquire/release is transparent to programs that don't need mid-loop
release.
Iterator KF_ACQUIRE support is not useful on its own right now, but it
becomes the foundation for KF_FORBID_SLEEP: an iterator whose _next is
annotated with KF_ACQUIRE | KF_FORBID_SLEEP can now express "holding this
pointer forbids sleeping; calling _release invalidates the pointer and
re-enables sleeping."
The task_vma iterator is the first user. It holds mmap_lock during
iteration, preventing sleepable helpers like bpf_copy_from_user().
Currently, this can lead to a deadlock as the fault path also takes
mmap_lock.
With this series, a BPF program can release the lock mid-iteration:
bpf_for_each(task_vma, vma, task, 0) {
u64 start = vma->vm_start;
/* sleeping forbidden, but vma pointer access allowed */
bpf_iter_task_vma_release(&___it);
/* mmap_lock released, vma pointer invalidated */
/* sleeping is fine here */
bpf_copy_from_user(&buf, sizeof(buf), (void *)start);
}
The series is organized as:
Patch 1: KF_ACQUIRE/KF_RELEASE plumbing for iterators in the
verifier. Pure infrastructure, no behavioral change to
existing iterators.
Patch 2: Consolidate sleepable context error message printing
in check_helper_call(), no functional changes
Patch 3: KF_FORBID_SLEEP flag and forbid_sleep_count machinery.
Generic, works for any KF_ACQUIRE kfunc - iterator or not.
Patch 4: Move mmap_lock acquisition from _new to _next in the
task_vma iterator, preparing for re-acquisition after
release.
Patch 5: Wire up task_vma as the first user: annotate _next with
KF_ACQUIRE | KF_FORBID_SLEEP, add bpf_iter_task_vma_release().
Patch 6: Selftests covering the runtime path (release + copy_from_user)
and verifier rejection of invalid patterns (sleeping without
release, VMA access after release, double release, release
without acquire, nested iterator interaction).
Puranjay Mohan (6):
bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators
bpf: consolidate sleepable context error message printing
bpf: Add KF_FORBID_SLEEP modifier for KF_ACQUIRE kfuncs
bpf: Move locking to bpf_iter_task_vma_next()
bpf: Add split iteration support to task_vma iterator
selftests/bpf: Add tests for split task_vma iterator
include/linux/bpf_verifier.h | 2 +
include/linux/btf.h | 1 +
kernel/bpf/btf.c | 3 +
kernel/bpf/helpers.c | 3 +-
kernel/bpf/task_iter.c | 44 ++++--
kernel/bpf/verifier.c | 147 +++++++++++++-----
.../testing/selftests/bpf/bpf_experimental.h | 1 +
.../testing/selftests/bpf/prog_tests/iters.c | 13 ++
.../selftests/bpf/progs/iters_task_vma.c | 39 +++++
.../bpf/progs/iters_task_vma_nosleep.c | 125 +++++++++++++++
10 files changed, 331 insertions(+), 47 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/iters_task_vma_nosleep.c
base-commit: 3ecf0b4a0e0ed4783aa32c5f3e42d23c7021e1c8
--
2.47.3
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH bpf-next v3 1/6] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators
2026-02-23 17:46 [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs Puranjay Mohan
@ 2026-02-23 17:46 ` Puranjay Mohan
2026-02-23 19:59 ` Mykyta Yatsenko
2026-02-23 20:41 ` Eduard Zingerman
2026-02-23 17:46 ` [PATCH bpf-next v3 2/6] bpf: consolidate sleepable context error message printing Puranjay Mohan
` (5 subsequent siblings)
6 siblings, 2 replies; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-23 17:46 UTC (permalink / raw)
To: bpf
Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
kernel-team
Some iterators hold resources (like mmap_lock in task_vma) that prevent
sleeping. To allow BPF programs to release such resources mid-iteration
and call sleepable helpers, the verifier needs to track acquire/release
semantics on iterator _next pointers.
Repurpose the st->id field on STACK_ITER slots to track the ref_obj_id
of the pointer returned by _next when the kfunc is annotated with
KF_ACQUIRE. This is safe because st->id is initialized to 0 by
__mark_reg_known_zero() in mark_stack_slots_iter() and is not compared
in stacksafe() for STACK_ITER slots.
The lifecycle is:
_next (KF_ACQUIRE):
- auto-release old ref if st->id != 0
- acquire new ref, store ref_obj_id in st->id
- DRAINED branch: release via st->id, set st->id = 0
- ACTIVE branch: keeps ref, st->id tracks it
_release (KF_RELEASE + __iter arg):
- read st->id, release_reference(), set st->id = 0
_destroy:
- release st->id if non-zero before releasing iterator's own ref
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
kernel/bpf/verifier.c | 61 ++++++++++++++++++++++++++++++++++++++++---
1 file changed, 58 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2ef00f9b94fe..c693dd663cab 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1038,6 +1038,7 @@ static void __mark_reg_known_zero(struct bpf_reg_state *reg);
static bool in_rcu_cs(struct bpf_verifier_env *env);
static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta);
+static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta);
static int mark_stack_slots_iter(struct bpf_verifier_env *env,
struct bpf_kfunc_call_arg_meta *meta,
@@ -1083,6 +1084,23 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
return 0;
}
+/*
+ * Release the acquired reference tracked by iter_st->id, if any.
+ * Used during auto-release in _next, DRAINED handling, and _destroy.
+ */
+static int iter_release_acquired_ref(struct bpf_verifier_env *env,
+ struct bpf_reg_state *iter_st)
+{
+ int err;
+
+ if (!iter_st->id)
+ return 0;
+ err = release_reference(env, iter_st->id);
+ if (!err)
+ iter_st->id = 0;
+ return err;
+}
+
static int unmark_stack_slots_iter(struct bpf_verifier_env *env,
struct bpf_reg_state *reg, int nr_slots)
{
@@ -1097,8 +1115,14 @@ static int unmark_stack_slots_iter(struct bpf_verifier_env *env,
struct bpf_stack_state *slot = &state->stack[spi - i];
struct bpf_reg_state *st = &slot->spilled_ptr;
- if (i == 0)
+ if (i == 0) {
+ /*
+ * Release any outstanding acquired ref tracked by st->id
+ * before releasing the iterator's own ref.
+ */
+ WARN_ON_ONCE(iter_release_acquired_ref(env, st));
WARN_ON_ONCE(release_reference(env, st->ref_obj_id));
+ }
__mark_reg_not_init(env, st);
@@ -8943,6 +8967,8 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id
/* remember meta->iter info for process_iter_next_call() */
meta->iter.spi = spi;
meta->iter.frameno = reg->frameno;
+ if (is_kfunc_release(meta))
+ meta->release_regno = regno;
meta->ref_obj_id = iter_ref_obj_id(env, reg, spi);
if (is_iter_destroy_kfunc(meta)) {
@@ -9178,8 +9204,11 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
/* mark current iter state as drained and assume returned NULL */
cur_iter->iter.state = BPF_ITER_STATE_DRAINED;
__mark_reg_const_zero(env, &cur_fr->regs[BPF_REG_0]);
-
- return 0;
+ /*
+ * If _next acquired a ref (KF_ACQUIRE), release it in the DRAINED branch since NULL
+ * was returned.
+ */
+ return iter_release_acquired_ref(env, cur_iter);
}
static bool arg_type_is_mem_size(enum bpf_arg_type type)
@@ -14197,6 +14226,17 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
if (meta.initialized_dynptr.ref_obj_id) {
err = unmark_stack_slots_dynptr(env, reg);
+ } else if (base_type(reg->type) == PTR_TO_STACK) {
+ struct bpf_reg_state *iter_st;
+
+ iter_st = get_iter_from_state(env->cur_state, &meta);
+ if (!iter_st->id) {
+ verbose(env, "no acquired reference to release\n");
+ return -EINVAL;
+ }
+ err = release_reference(env, iter_st->id);
+ if (!err)
+ iter_st->id = 0;
} else {
err = release_reference(env, reg->ref_obj_id);
if (err)
@@ -14274,6 +14314,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
__mark_reg_const_zero(env, ®s[BPF_REG_0]);
mark_btf_func_reg_size(env, BPF_REG_0, t->size);
} else if (btf_type_is_ptr(t)) {
+ struct bpf_reg_state *iter_acquire_st = NULL;
+
ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
err = check_special_kfunc(env, &meta, regs, insn_aux, ptr_type, desc_btf);
if (err) {
@@ -14356,6 +14398,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
regs[BPF_REG_0].id = ++env->id_gen;
}
mark_btf_func_reg_size(env, BPF_REG_0, sizeof(void *));
+ /*
+ * For iterators with KF_ACQUIRE, auto-release the previous iteration's ref before
+ * acquiring a new one, and after acquisition track the new ref on the iter slot.
+ */
+ if (is_iter_next_kfunc(&meta) && is_kfunc_acquire(&meta)) {
+ iter_acquire_st = get_iter_from_state(env->cur_state, &meta);
+ err = iter_release_acquired_ref(env, iter_acquire_st);
+ if (err)
+ return err;
+ }
if (is_kfunc_acquire(&meta)) {
int id = acquire_reference(env, insn_idx);
@@ -14368,6 +14420,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
ref_set_non_owning(env, ®s[BPF_REG_0]);
}
+ if (iter_acquire_st)
+ iter_acquire_st->id = regs[BPF_REG_0].ref_obj_id;
+
if (reg_may_point_to_spin_lock(®s[BPF_REG_0]) && !regs[BPF_REG_0].id)
regs[BPF_REG_0].id = ++env->id_gen;
} else if (btf_type_is_void(t)) {
--
2.47.3
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH bpf-next v3 2/6] bpf: consolidate sleepable context error message printing
2026-02-23 17:46 [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs Puranjay Mohan
2026-02-23 17:46 ` [PATCH bpf-next v3 1/6] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators Puranjay Mohan
@ 2026-02-23 17:46 ` Puranjay Mohan
2026-02-23 20:06 ` Mykyta Yatsenko
2026-02-23 17:46 ` [PATCH bpf-next v3 3/6] bpf: Add KF_FORBID_SLEEP modifier for KF_ACQUIRE kfuncs Puranjay Mohan
` (4 subsequent siblings)
6 siblings, 1 reply; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-23 17:46 UTC (permalink / raw)
To: bpf
Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
kernel-team
check_helper_call() prints the error message for every
env->cur_state->active* element when calling a sleepable helper.
Consolidate all of them into a single print statement.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
kernel/bpf/verifier.c | 40 ++++++++++++++++++----------------------
1 file changed, 18 insertions(+), 22 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c693dd663cab..c2a63f8c8984 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11574,6 +11574,19 @@ static inline bool in_sleepable_context(struct bpf_verifier_env *env)
in_sleepable(env);
}
+static const char *non_sleepable_context_description(struct bpf_verifier_env *env)
+{
+ if (env->cur_state->active_rcu_locks)
+ return "rcu_read_lock region";
+ if (env->cur_state->active_preempt_locks)
+ return "non-preemptible region";
+ if (env->cur_state->active_irq_id)
+ return "IRQ-disabled region";
+ if (env->cur_state->active_locks)
+ return "lock region";
+ return "non-sleepable context";
+}
+
static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
int *insn_idx_p)
{
@@ -11634,28 +11647,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
return err;
}
- if (env->cur_state->active_rcu_locks) {
- if (fn->might_sleep) {
- verbose(env, "sleepable helper %s#%d in rcu_read_lock region\n",
- func_id_name(func_id), func_id);
- return -EINVAL;
- }
- }
-
- if (env->cur_state->active_preempt_locks) {
- if (fn->might_sleep) {
- verbose(env, "sleepable helper %s#%d in non-preemptible region\n",
- func_id_name(func_id), func_id);
- return -EINVAL;
- }
- }
-
- if (env->cur_state->active_irq_id) {
- if (fn->might_sleep) {
- verbose(env, "sleepable helper %s#%d in IRQ-disabled region\n",
- func_id_name(func_id), func_id);
- return -EINVAL;
- }
+ if (fn->might_sleep && !in_sleepable_context(env)) {
+ verbose(env, "sleepable helper %s#%d in %s\n",
+ func_id_name(func_id), func_id,
+ non_sleepable_context_description(env));
+ return -EINVAL;
}
/* Track non-sleepable context for helpers. */
--
2.47.3
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH bpf-next v3 3/6] bpf: Add KF_FORBID_SLEEP modifier for KF_ACQUIRE kfuncs
2026-02-23 17:46 [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs Puranjay Mohan
2026-02-23 17:46 ` [PATCH bpf-next v3 1/6] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators Puranjay Mohan
2026-02-23 17:46 ` [PATCH bpf-next v3 2/6] bpf: consolidate sleepable context error message printing Puranjay Mohan
@ 2026-02-23 17:46 ` Puranjay Mohan
2026-02-23 22:14 ` Eduard Zingerman
2026-02-23 17:46 ` [PATCH bpf-next v3 4/6] bpf: Move locking to bpf_iter_task_vma_next() Puranjay Mohan
` (3 subsequent siblings)
6 siblings, 1 reply; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-23 17:46 UTC (permalink / raw)
To: bpf
Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
kernel-team
With KF_ACQUIRE support for iterators in place, we need a way to tell
the verifier that holding a particular acquired reference forbids
sleeping. For example, task_vma's _next holds mmap_lock, so sleeping
between _next and _release must be rejected.
Add a KF_FORBID_SLEEP flag (1 << 17) that can be combined with
KF_ACQUIRE. When acquire_reference() is called for such a kfunc, the
reference is tagged with forbid_sleep=true and a per-state
forbid_sleep_count counter is incremented. When the reference is
released through release_reference_nomark(), the counter is decremented
in the same loop that already scans the refs array.
The counter is checked wherever the verifier decides if sleeping is
allowed.
This is generic and works for both iterator and non-iterator kfuncs.
For iterators, the auto-release and explicit _release from the previous
commit handle the counter decrement automatically via
release_reference().
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
include/linux/bpf_verifier.h | 2 ++
include/linux/btf.h | 1 +
kernel/bpf/btf.c | 3 +++
kernel/bpf/verifier.c | 46 ++++++++++++++++++++++++++++--------
4 files changed, 42 insertions(+), 10 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index c1e30096ea7b..39904401df3d 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -260,6 +260,7 @@ struct bpf_reference_state {
* it matches on unlock.
*/
void *ptr;
+ bool forbid_sleep; /* ref prevents sleeping while held */
};
struct bpf_retval_range {
@@ -420,6 +421,7 @@ struct bpf_verifier_state {
u32 active_lock_id;
void *active_lock_ptr;
u32 active_rcu_locks;
+ u32 forbid_sleep_count;
bool speculative;
bool in_sleepable;
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 48108471c5b1..c326c5ba49cb 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -79,6 +79,7 @@
#define KF_ARENA_ARG1 (1 << 14) /* kfunc takes an arena pointer as its first argument */
#define KF_ARENA_ARG2 (1 << 15) /* kfunc takes an arena pointer as its second argument */
#define KF_IMPLICIT_ARGS (1 << 16) /* kfunc has implicit arguments supplied by the verifier */
+#define KF_FORBID_SLEEP (1 << 17) /* acquired reference forbids sleeping while held */
/*
* Tag marking a kernel function as a kfunc. This is meant to minimize the
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 4872d2a6c42d..65327d2f19c1 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -8644,6 +8644,9 @@ static int btf_check_kfunc_protos(struct btf *btf, u32 func_id, u32 func_flags)
return err;
}
+ if ((func_flags & KF_FORBID_SLEEP) && !(func_flags & KF_ACQUIRE))
+ return -EINVAL;
+
return 0;
}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c2a63f8c8984..a90c3d1b2d72 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -202,7 +202,7 @@ struct bpf_verifier_stack_elem {
#define BPF_PRIV_STACK_MIN_SIZE 64
-static int acquire_reference(struct bpf_verifier_env *env, int insn_idx);
+static int acquire_reference(struct bpf_verifier_env *env, int insn_idx, bool forbid_sleep);
static int release_reference_nomark(struct bpf_verifier_state *state, int ref_obj_id);
static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
static void invalidate_non_owning_refs(struct bpf_verifier_env *env);
@@ -210,6 +210,7 @@ static bool in_rbtree_lock_required_cb(struct bpf_verifier_env *env);
static int ref_set_non_owning(struct bpf_verifier_env *env,
struct bpf_reg_state *reg);
static bool is_trusted_reg(const struct bpf_reg_state *reg);
+static inline bool in_sleepable_context(struct bpf_verifier_env *env);
static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
{
@@ -805,7 +806,7 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
if (clone_ref_obj_id)
id = clone_ref_obj_id;
else
- id = acquire_reference(env, insn_idx);
+ id = acquire_reference(env, insn_idx, false);
if (id < 0)
return id;
@@ -1052,7 +1053,7 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
if (spi < 0)
return spi;
- id = acquire_reference(env, insn_idx);
+ id = acquire_reference(env, insn_idx, false);
if (id < 0)
return id;
@@ -1474,6 +1475,7 @@ static int copy_reference_state(struct bpf_verifier_state *dst, const struct bpf
dst->active_irq_id = src->active_irq_id;
dst->active_lock_id = src->active_lock_id;
dst->active_lock_ptr = src->active_lock_ptr;
+ dst->forbid_sleep_count = src->forbid_sleep_count;
return 0;
}
@@ -1547,7 +1549,7 @@ static struct bpf_reference_state *acquire_reference_state(struct bpf_verifier_e
return &state->refs[new_ofs];
}
-static int acquire_reference(struct bpf_verifier_env *env, int insn_idx)
+static int acquire_reference(struct bpf_verifier_env *env, int insn_idx, bool forbid_sleep)
{
struct bpf_reference_state *s;
@@ -1556,6 +1558,9 @@ static int acquire_reference(struct bpf_verifier_env *env, int insn_idx)
return -ENOMEM;
s->type = REF_TYPE_PTR;
s->id = ++env->id_gen;
+ s->forbid_sleep = forbid_sleep;
+ if (forbid_sleep)
+ env->cur_state->forbid_sleep_count++;
return s->id;
}
@@ -1598,6 +1603,9 @@ static void release_reference_state(struct bpf_verifier_state *state, int idx)
int last_idx;
size_t rem;
+ if (state->refs[idx].forbid_sleep)
+ state->forbid_sleep_count--;
+
/* IRQ state requires the relative ordering of elements remaining the
* same, since it relies on the refs array to behave as a stack, so that
* it can detect out-of-order IRQ restore. Hence use memmove to shift
@@ -10845,9 +10853,7 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
return -EINVAL;
}
- if (env->subprog_info[subprog].might_sleep &&
- (env->cur_state->active_rcu_locks || env->cur_state->active_preempt_locks ||
- env->cur_state->active_irq_id || !in_sleepable(env))) {
+ if (env->subprog_info[subprog].might_sleep && !in_sleepable_context(env)) {
verbose(env, "global functions that may sleep are not allowed in non-sleepable context,\n"
"i.e., in a RCU/IRQ/preempt-disabled section, or in\n"
"a non-sleepable BPF program context\n");
@@ -11571,6 +11577,7 @@ static inline bool in_sleepable_context(struct bpf_verifier_env *env)
!env->cur_state->active_preempt_locks &&
!env->cur_state->active_locks &&
!env->cur_state->active_irq_id &&
+ !env->cur_state->forbid_sleep_count &&
in_sleepable(env);
}
@@ -11584,6 +11591,8 @@ static const char *non_sleepable_context_description(struct bpf_verifier_env *en
return "IRQ-disabled region";
if (env->cur_state->active_locks)
return "lock region";
+ if (env->cur_state->forbid_sleep_count)
+ return "nosleep region";
return "non-sleepable context";
}
@@ -12035,7 +12044,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
/* For release_reference() */
regs[BPF_REG_0].ref_obj_id = meta.ref_obj_id;
} else if (is_acquire_function(func_id, meta.map.ptr)) {
- int id = acquire_reference(env, insn_idx);
+ int id = acquire_reference(env, insn_idx, false);
if (id < 0)
return id;
@@ -12140,6 +12149,11 @@ static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta)
return meta->kfunc_flags & KF_RELEASE;
}
+static bool is_kfunc_forbid_sleep(struct bpf_kfunc_call_arg_meta *meta)
+{
+ return meta->kfunc_flags & KF_FORBID_SLEEP;
+}
+
static bool is_kfunc_sleepable(struct bpf_kfunc_call_arg_meta *meta)
{
return meta->kfunc_flags & KF_SLEEPABLE;
@@ -14214,6 +14228,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
return -EACCES;
}
+ if (sleepable && env->cur_state->forbid_sleep_count) {
+ verbose(env, "sleepable kfunc %s in nosleep region\n", func_name);
+ return -EACCES;
+ }
+
/* In case of release function, we get register number of refcounted
* PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now.
*/
@@ -14405,7 +14424,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
return err;
}
if (is_kfunc_acquire(&meta)) {
- int id = acquire_reference(env, insn_idx);
+ bool forbid_sleep = is_kfunc_forbid_sleep(&meta);
+ int id = acquire_reference(env, insn_idx, forbid_sleep);
if (id < 0)
return id;
@@ -20049,6 +20069,9 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
if (old->active_rcu_locks != cur->active_rcu_locks)
return false;
+ if (old->forbid_sleep_count != cur->forbid_sleep_count)
+ return false;
+
if (!check_ids(old->active_irq_id, cur->active_irq_id, idmap))
return false;
@@ -20062,6 +20085,9 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
return false;
switch (old->refs[i].type) {
case REF_TYPE_PTR:
+ if (old->refs[i].forbid_sleep != cur->refs[i].forbid_sleep)
+ return false;
+ break;
case REF_TYPE_IRQ:
break;
case REF_TYPE_LOCK:
@@ -24584,7 +24610,7 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog)
if (!subprog && env->prog->type == BPF_PROG_TYPE_STRUCT_OPS) {
for (i = 0; i < aux->ctx_arg_info_size; i++)
aux->ctx_arg_info[i].ref_obj_id = aux->ctx_arg_info[i].refcounted ?
- acquire_reference(env, 0) : 0;
+ acquire_reference(env, 0, false) : 0;
}
ret = do_check(env);
--
2.47.3
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH bpf-next v3 4/6] bpf: Move locking to bpf_iter_task_vma_next()
2026-02-23 17:46 [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs Puranjay Mohan
` (2 preceding siblings ...)
2026-02-23 17:46 ` [PATCH bpf-next v3 3/6] bpf: Add KF_FORBID_SLEEP modifier for KF_ACQUIRE kfuncs Puranjay Mohan
@ 2026-02-23 17:46 ` Puranjay Mohan
2026-02-23 17:46 ` [PATCH bpf-next v3 5/6] bpf: Add split iteration support to task_vma iterator Puranjay Mohan
` (2 subsequent siblings)
6 siblings, 0 replies; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-23 17:46 UTC (permalink / raw)
To: bpf
Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
kernel-team
The current implementation of task_vma iterator takes the mmap_lock in
the _new() function and holds it for the entire duration of the
iterator. The next commits will allow releasing the lock in the middle
of the iteration and it would mean that the _next() call should re-take
the mmap_lock.
Move the mmap_lock setup to bpf_iter_task_vma_next()
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
kernel/bpf/task_iter.c | 34 +++++++++++++++++++++++-----------
1 file changed, 23 insertions(+), 11 deletions(-)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 98d9b4c0daff..a85115c191e4 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -799,6 +799,8 @@ struct bpf_iter_task_vma_kern_data {
struct mm_struct *mm;
struct mmap_unlock_irq_work *work;
struct vma_iterator vmi;
+ u64 last_addr;
+ bool locked;
};
struct bpf_iter_task_vma {
@@ -819,7 +821,6 @@ __bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
struct task_struct *task, u64 addr)
{
struct bpf_iter_task_vma_kern *kit = (void *)it;
- bool irq_work_busy = false;
int err;
BUILD_BUG_ON(sizeof(struct bpf_iter_task_vma_kern) != sizeof(struct bpf_iter_task_vma));
@@ -840,14 +841,8 @@ __bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
goto err_cleanup_iter;
}
- /* kit->data->work == NULL is valid after bpf_mmap_unlock_get_irq_work */
- irq_work_busy = bpf_mmap_unlock_get_irq_work(&kit->data->work);
- if (irq_work_busy || !mmap_read_trylock(kit->data->mm)) {
- err = -EBUSY;
- goto err_cleanup_iter;
- }
-
- vma_iter_init(&kit->data->vmi, kit->data->mm, addr);
+ kit->data->locked = false;
+ kit->data->last_addr = addr;
return 0;
err_cleanup_iter:
@@ -862,10 +857,26 @@ __bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
__bpf_kfunc struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_vma *it)
{
struct bpf_iter_task_vma_kern *kit = (void *)it;
+ struct vm_area_struct *vma;
if (!kit->data) /* bpf_iter_task_vma_new failed */
return NULL;
- return vma_next(&kit->data->vmi);
+
+ if (!kit->data->locked) {
+ bool irq_work_busy;
+
+ irq_work_busy = bpf_mmap_unlock_get_irq_work(&kit->data->work);
+ if (irq_work_busy || !mmap_read_trylock(kit->data->mm))
+ return NULL;
+
+ kit->data->locked = true;
+ vma_iter_init(&kit->data->vmi, kit->data->mm, kit->data->last_addr);
+ }
+
+ vma = vma_next(&kit->data->vmi);
+ if (vma)
+ kit->data->last_addr = vma->vm_end;
+ return vma;
}
__bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it)
@@ -873,7 +884,8 @@ __bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it)
struct bpf_iter_task_vma_kern *kit = (void *)it;
if (kit->data) {
- bpf_mmap_unlock_mm(kit->data->work, kit->data->mm);
+ if (kit->data->locked)
+ bpf_mmap_unlock_mm(kit->data->work, kit->data->mm);
put_task_struct(kit->data->task);
bpf_mem_free(&bpf_global_ma, kit->data);
}
--
2.47.3
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH bpf-next v3 5/6] bpf: Add split iteration support to task_vma iterator
2026-02-23 17:46 [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs Puranjay Mohan
` (3 preceding siblings ...)
2026-02-23 17:46 ` [PATCH bpf-next v3 4/6] bpf: Move locking to bpf_iter_task_vma_next() Puranjay Mohan
@ 2026-02-23 17:46 ` Puranjay Mohan
2026-02-23 17:46 ` [PATCH bpf-next v3 6/6] selftests/bpf: Add tests for split " Puranjay Mohan
2026-02-24 1:49 ` [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs Alexei Starovoitov
6 siblings, 0 replies; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-23 17:46 UTC (permalink / raw)
To: bpf
Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
kernel-team
Wire up the task_vma iterator to use the KF_ACQUIRE/KF_RELEASE and
KF_FORBID_SLEEP infrastructure from the previous commits.
Annotate bpf_iter_task_vma_next with KF_ACQUIRE | KF_FORBID_SLEEP so
each returned VMA pointer becomes a tracked reference that prevents
sleeping. Add bpf_iter_task_vma_release(it__iter) with KF_RELEASE that
releases the mmap read lock, drops the acquired reference, and
re-enables sleeping. The next call to _next will re-acquire the lock
via mmap_read_trylock().
This enables a split iteration pattern:
bpf_for_each(task_vma, vma, task, 0) {
u64 start = vma->vm_start;
/* vma access allowed, sleeping forbidden */
bpf_iter_task_vma_release(&___it);
/* vma access forbidden, sleeping allowed */
bpf_copy_from_user(&buf, sizeof(buf), (void *)start);
}
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
kernel/bpf/helpers.c | 3 ++-
kernel/bpf/task_iter.c | 10 ++++++++++
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 6eb6c82ed2ee..2663280c92c6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -4605,8 +4605,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_iter_task_vma_new, KF_ITER_NEW | KF_RCU)
-BTF_ID_FLAGS(func, bpf_iter_task_vma_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_task_vma_next, KF_ITER_NEXT | KF_RET_NULL | KF_ACQUIRE | KF_FORBID_SLEEP)
BTF_ID_FLAGS(func, bpf_iter_task_vma_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_task_vma_release, KF_RELEASE)
#ifdef CONFIG_CGROUPS
BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index a85115c191e4..55b264d634db 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -879,6 +879,16 @@ __bpf_kfunc struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_v
return vma;
}
+__bpf_kfunc void bpf_iter_task_vma_release(struct bpf_iter_task_vma *it__iter)
+{
+ struct bpf_iter_task_vma_kern *kit = (void *)it__iter;
+
+ if (kit->data && kit->data->locked) {
+ bpf_mmap_unlock_mm(kit->data->work, kit->data->mm);
+ kit->data->locked = false;
+ }
+}
+
__bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it)
{
struct bpf_iter_task_vma_kern *kit = (void *)it;
--
2.47.3
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH bpf-next v3 6/6] selftests/bpf: Add tests for split task_vma iterator
2026-02-23 17:46 [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs Puranjay Mohan
` (4 preceding siblings ...)
2026-02-23 17:46 ` [PATCH bpf-next v3 5/6] bpf: Add split iteration support to task_vma iterator Puranjay Mohan
@ 2026-02-23 17:46 ` Puranjay Mohan
2026-02-24 1:49 ` [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs Alexei Starovoitov
6 siblings, 0 replies; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-23 17:46 UTC (permalink / raw)
To: bpf
Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
kernel-team
Add runtime and verifier tests for the split task_vma iteration model.
The runtime test iter_task_vma_release_and_copy iterates VMAs, saves
vm_start, releases mmap_lock via bpf_iter_task_vma_release(), then
calls bpf_copy_from_user() on the saved address.
The verifier tests cover:
- nosleep_iter_release_then_sleep: release then sleep is accepted
- nosleep_iter_sleep_without_release: sleep without release is rejected
with "in nosleep region"
- nosleep_iter_vma_access_after_release: VMA access after release is
rejected with "invalid mem access 'scalar'"
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
.../testing/selftests/bpf/bpf_experimental.h | 1 +
.../testing/selftests/bpf/prog_tests/iters.c | 13 ++
.../selftests/bpf/progs/iters_task_vma.c | 39 ++++++
.../bpf/progs/iters_task_vma_nosleep.c | 125 ++++++++++++++++++
4 files changed, 178 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/iters_task_vma_nosleep.c
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 9df77e59d4f5..531d6c6aab45 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -165,6 +165,7 @@ extern int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
struct task_struct *task,
__u64 addr) __ksym;
extern struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_vma *it) __ksym;
+extern void bpf_iter_task_vma_release(struct bpf_iter_task_vma *it) __ksym;
extern void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it) __ksym;
/* Convenience macro to wrap over bpf_obj_drop_impl */
diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
index a539980a2fbe..977114f0e88f 100644
--- a/tools/testing/selftests/bpf/prog_tests/iters.c
+++ b/tools/testing/selftests/bpf/prog_tests/iters.c
@@ -21,6 +21,7 @@
#include "iters_css_task.skel.h"
#include "iters_css.skel.h"
#include "iters_task_failure.skel.h"
+#include "iters_task_vma_nosleep.skel.h"
static void subtest_num_iters(void)
{
@@ -152,6 +153,17 @@ static void subtest_task_vma_iters(void)
if (!ASSERT_EQ(skel->bss->vmas_seen, seen, "vmas_seen_eq"))
goto cleanup;
+ /* Test release+sleepable: trigger the release_and_copy program */
+ skel->bss->release_vmas_seen = 0;
+ err = iters_task_vma__attach(skel);
+ if (!ASSERT_OK(err, "skel_reattach"))
+ goto cleanup;
+
+ getpgid(skel->bss->target_pid);
+ iters_task_vma__detach(skel);
+
+ ASSERT_GT(skel->bss->release_vmas_seen, 0, "release_vmas_seen_gt_zero");
+
cleanup:
if (f)
fclose(f);
@@ -322,4 +334,5 @@ void test_iters(void)
if (test__start_subtest("css"))
subtest_css_iters();
RUN_TESTS(iters_task_failure);
+ RUN_TESTS(iters_task_vma_nosleep);
}
diff --git a/tools/testing/selftests/bpf/progs/iters_task_vma.c b/tools/testing/selftests/bpf/progs/iters_task_vma.c
index dc0c3691dcc2..59065f9da4f8 100644
--- a/tools/testing/selftests/bpf/progs/iters_task_vma.c
+++ b/tools/testing/selftests/bpf/progs/iters_task_vma.c
@@ -40,4 +40,43 @@ int iter_task_vma_for_each(const void *ctx)
return 0;
}
+unsigned int release_vmas_seen = 0;
+
+SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
+int iter_task_vma_release_and_copy(const void *ctx)
+{
+ struct task_struct *task = bpf_get_current_task_btf();
+ struct vm_area_struct *vma;
+ unsigned int seen = 0;
+
+ if (task->pid != target_pid)
+ return 0;
+
+ if (release_vmas_seen)
+ return 0;
+
+ bpf_for_each(task_vma, vma, task, 0) {
+ __u64 start;
+ char buf[8];
+
+ if (bpf_cmp_unlikely(seen, >=, 1000))
+ break;
+
+ /* Phase 1: mmap_lock held, read VMA data */
+ start = vma->vm_start;
+
+ /* Transition: release mmap_lock */
+ bpf_iter_task_vma_release(&___it);
+ /* VMA pointer is now invalid; sleepable helpers allowed */
+
+ /* Phase 2: mmap_lock released, sleepable call */
+ bpf_copy_from_user(&buf, sizeof(buf), (void *)start);
+
+ seen++;
+ }
+
+ release_vmas_seen = seen;
+ return 0;
+}
+
char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/iters_task_vma_nosleep.c b/tools/testing/selftests/bpf/progs/iters_task_vma_nosleep.c
new file mode 100644
index 000000000000..ab607e29b36a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_task_vma_nosleep.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include "bpf_experimental.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+/* Negative test: sleepable call without release should be rejected */
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("sleepable helper bpf_copy_from_user#148 in nosleep region")
+int nosleep_iter_sleep_without_release(const void *ctx)
+{
+ struct task_struct *task = bpf_get_current_task_btf();
+ struct vm_area_struct *vma;
+
+ bpf_for_each(task_vma, vma, task, 0) {
+ char buf[8];
+
+ /* Attempt to call sleepable helper without releasing mmap_lock.
+ * Verifier should reject this.
+ */
+ bpf_copy_from_user(&buf, sizeof(buf), (void *)vma->vm_start);
+ break;
+ }
+ return 0;
+}
+
+/* Negative test: VMA access after release should be rejected */
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("invalid mem access 'scalar'")
+int nosleep_iter_vma_access_after_release(const void *ctx)
+{
+ struct task_struct *task = bpf_get_current_task_btf();
+ struct vm_area_struct *vma;
+ __u64 val = 0;
+
+ bpf_for_each(task_vma, vma, task, 0) {
+ bpf_iter_task_vma_release(&___it);
+ /* VMA pointer is now invalid. Accessing it should be rejected. */
+ val = vma->vm_start;
+ break;
+ }
+ __sink(val);
+ return 0;
+}
+
+/* Positive test: release then sleepable call should succeed */
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__success
+int nosleep_iter_release_then_sleep(const void *ctx)
+{
+ struct task_struct *task = bpf_get_current_task_btf();
+ struct vm_area_struct *vma;
+
+ bpf_for_each(task_vma, vma, task, 0) {
+ __u64 start = vma->vm_start;
+ char buf[8];
+
+ bpf_iter_task_vma_release(&___it);
+ bpf_copy_from_user(&buf, sizeof(buf), (void *)start);
+ break;
+ }
+ return 0;
+}
+
+/* Negative test: double release should be rejected */
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("no acquired reference to release")
+int nosleep_iter_double_release(const void *ctx)
+{
+ struct task_struct *task = bpf_get_current_task_btf();
+ struct vm_area_struct *vma;
+
+ bpf_for_each(task_vma, vma, task, 0) {
+ bpf_iter_task_vma_release(&___it);
+ bpf_iter_task_vma_release(&___it);
+ break;
+ }
+ return 0;
+}
+
+/* Negative test: release without any prior _next acquire */
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("no acquired reference to release")
+int nosleep_iter_release_without_acquire(const void *ctx)
+{
+ struct task_struct *task = bpf_get_current_task_btf();
+ struct bpf_iter_task_vma it;
+
+ bpf_iter_task_vma_new(&it, task, 0);
+ /* No _next called, so no acquired reference exists */
+ bpf_iter_task_vma_release(&it);
+ bpf_iter_task_vma_destroy(&it);
+ return 0;
+}
+
+/* Negative test: nested iterators, releasing inner should not allow sleeping
+ * because the outer iterator still holds a nosleep reference.
+ */
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("sleepable helper bpf_copy_from_user#148 in nosleep region")
+int nosleep_iter_nested_release_inner(const void *ctx)
+{
+ struct task_struct *task = bpf_get_current_task_btf();
+ struct vm_area_struct *vma_outer, *vma_inner;
+
+ bpf_for_each(task_vma, vma_outer, task, 0) {
+ bpf_for_each(task_vma, vma_inner, task, 0) {
+ __u64 start = vma_inner->vm_start;
+ char buf[8];
+
+ bpf_iter_task_vma_release(&___it);
+ /* Inner released, but outer still holds nosleep ref.
+ * Sleeping should still be forbidden.
+ */
+ bpf_copy_from_user(&buf, sizeof(buf), (void *)start);
+ break;
+ }
+ break;
+ }
+ return 0;
+}
--
2.47.3
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 1/6] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators
2026-02-23 17:46 ` [PATCH bpf-next v3 1/6] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators Puranjay Mohan
@ 2026-02-23 19:59 ` Mykyta Yatsenko
2026-02-23 20:41 ` Eduard Zingerman
1 sibling, 0 replies; 24+ messages in thread
From: Mykyta Yatsenko @ 2026-02-23 19:59 UTC (permalink / raw)
To: Puranjay Mohan, bpf
Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, kernel-team
Puranjay Mohan <puranjay@kernel.org> writes:
> Some iterators hold resources (like mmap_lock in task_vma) that prevent
> sleeping. To allow BPF programs to release such resources mid-iteration
> and call sleepable helpers, the verifier needs to track acquire/release
> semantics on iterator _next pointers.
>
> Repurpose the st->id field on STACK_ITER slots to track the ref_obj_id
> of the pointer returned by _next when the kfunc is annotated with
> KF_ACQUIRE. This is safe because st->id is initialized to 0 by
> __mark_reg_known_zero() in mark_stack_slots_iter() and is not compared
> in stacksafe() for STACK_ITER slots.
>
> The lifecycle is:
>
> _next (KF_ACQUIRE):
> - auto-release old ref if st->id != 0
> - acquire new ref, store ref_obj_id in st->id
> - DRAINED branch: release via st->id, set st->id = 0
> - ACTIVE branch: keeps ref, st->id tracks it
>
> _release (KF_RELEASE + __iter arg):
> - read st->id, release_reference(), set st->id = 0
>
> _destroy:
> - release st->id if non-zero before releasing iterator's own ref
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
> kernel/bpf/verifier.c | 61 ++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 58 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 2ef00f9b94fe..c693dd663cab 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1038,6 +1038,7 @@ static void __mark_reg_known_zero(struct bpf_reg_state *reg);
> static bool in_rcu_cs(struct bpf_verifier_env *env);
>
> static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta);
> +static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta);
>
> static int mark_stack_slots_iter(struct bpf_verifier_env *env,
> struct bpf_kfunc_call_arg_meta *meta,
> @@ -1083,6 +1084,23 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
> return 0;
> }
>
> +/*
> + * Release the acquired reference tracked by iter_st->id, if any.
> + * Used during auto-release in _next, DRAINED handling, and _destroy.
> + */
> +static int iter_release_acquired_ref(struct bpf_verifier_env *env,
> + struct bpf_reg_state *iter_st)
> +{
> + int err;
> +
> + if (!iter_st->id)
> + return 0;
> + err = release_reference(env, iter_st->id);
> + if (!err)
> + iter_st->id = 0;
> + return err;
> +}
> +
> static int unmark_stack_slots_iter(struct bpf_verifier_env *env,
> struct bpf_reg_state *reg, int nr_slots)
> {
> @@ -1097,8 +1115,14 @@ static int unmark_stack_slots_iter(struct bpf_verifier_env *env,
> struct bpf_stack_state *slot = &state->stack[spi - i];
> struct bpf_reg_state *st = &slot->spilled_ptr;
>
> - if (i == 0)
> + if (i == 0) {
> + /*
> + * Release any outstanding acquired ref tracked by st->id
> + * before releasing the iterator's own ref.
> + */
> + WARN_ON_ONCE(iter_release_acquired_ref(env, st));
> WARN_ON_ONCE(release_reference(env, st->ref_obj_id));
> + }
>
> __mark_reg_not_init(env, st);
>
> @@ -8943,6 +8967,8 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id
> /* remember meta->iter info for process_iter_next_call() */
> meta->iter.spi = spi;
> meta->iter.frameno = reg->frameno;
> + if (is_kfunc_release(meta))
> + meta->release_regno = regno;
> meta->ref_obj_id = iter_ref_obj_id(env, reg, spi);
>
> if (is_iter_destroy_kfunc(meta)) {
> @@ -9178,8 +9204,11 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
> /* mark current iter state as drained and assume returned NULL */
> cur_iter->iter.state = BPF_ITER_STATE_DRAINED;
> __mark_reg_const_zero(env, &cur_fr->regs[BPF_REG_0]);
> -
> - return 0;
> + /*
> + * If _next acquired a ref (KF_ACQUIRE), release it in the DRAINED branch since NULL
> + * was returned.
> + */
> + return iter_release_acquired_ref(env, cur_iter);
> }
>
> static bool arg_type_is_mem_size(enum bpf_arg_type type)
> @@ -14197,6 +14226,17 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>
> if (meta.initialized_dynptr.ref_obj_id) {
> err = unmark_stack_slots_dynptr(env, reg);
> + } else if (base_type(reg->type) == PTR_TO_STACK) {
Do we also need to check that corresponding stack slot is iterator?
I see there is is_iter_reg_valid_init() that does that, maybe we can use
it here.
> + struct bpf_reg_state *iter_st;
> +
> + iter_st = get_iter_from_state(env->cur_state, &meta);
> + if (!iter_st->id) {
> + verbose(env, "no acquired reference to release\n");
> + return -EINVAL;
> + }
> + err = release_reference(env, iter_st->id);
> + if (!err)
> + iter_st->id = 0;
> } else {
> err = release_reference(env, reg->ref_obj_id);
> if (err)
> @@ -14274,6 +14314,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> __mark_reg_const_zero(env, ®s[BPF_REG_0]);
> mark_btf_func_reg_size(env, BPF_REG_0, t->size);
> } else if (btf_type_is_ptr(t)) {
> + struct bpf_reg_state *iter_acquire_st = NULL;
> +
> ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
> err = check_special_kfunc(env, &meta, regs, insn_aux, ptr_type, desc_btf);
> if (err) {
> @@ -14356,6 +14398,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> regs[BPF_REG_0].id = ++env->id_gen;
> }
> mark_btf_func_reg_size(env, BPF_REG_0, sizeof(void *));
> + /*
> + * For iterators with KF_ACQUIRE, auto-release the previous iteration's ref before
> + * acquiring a new one, and after acquisition track the new ref on the iter slot.
> + */
> + if (is_iter_next_kfunc(&meta) && is_kfunc_acquire(&meta)) {
> + iter_acquire_st = get_iter_from_state(env->cur_state, &meta);
> + err = iter_release_acquired_ref(env, iter_acquire_st);
> + if (err)
> + return err;
> + }
> if (is_kfunc_acquire(&meta)) {
nit: what if we move is_iter_next_kfunc() check into this block, so that
we have only one place where we check is_kfunc_acquire() in this function?
> int id = acquire_reference(env, insn_idx);
>
> @@ -14368,6 +14420,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> ref_set_non_owning(env, ®s[BPF_REG_0]);
> }
>
> + if (iter_acquire_st)
> + iter_acquire_st->id = regs[BPF_REG_0].ref_obj_id;
> +
> if (reg_may_point_to_spin_lock(®s[BPF_REG_0]) && !regs[BPF_REG_0].id)
> regs[BPF_REG_0].id = ++env->id_gen;
> } else if (btf_type_is_void(t)) {
> --
> 2.47.3
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 2/6] bpf: consolidate sleepable context error message printing
2026-02-23 17:46 ` [PATCH bpf-next v3 2/6] bpf: consolidate sleepable context error message printing Puranjay Mohan
@ 2026-02-23 20:06 ` Mykyta Yatsenko
2026-02-23 20:27 ` Eduard Zingerman
0 siblings, 1 reply; 24+ messages in thread
From: Mykyta Yatsenko @ 2026-02-23 20:06 UTC (permalink / raw)
To: Puranjay Mohan, bpf
Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, kernel-team
Puranjay Mohan <puranjay@kernel.org> writes:
> check_helper_call() prints the error message for every
> env->cur_state->active* element when calling a sleepable helper.
> Consolidate all of them into a single print statement.
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
> kernel/bpf/verifier.c | 40 ++++++++++++++++++----------------------
> 1 file changed, 18 insertions(+), 22 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index c693dd663cab..c2a63f8c8984 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -11574,6 +11574,19 @@ static inline bool in_sleepable_context(struct bpf_verifier_env *env)
> in_sleepable(env);
> }
>
> +static const char *non_sleepable_context_description(struct bpf_verifier_env *env)
> +{
> + if (env->cur_state->active_rcu_locks)
> + return "rcu_read_lock region";
> + if (env->cur_state->active_preempt_locks)
> + return "non-preemptible region";
> + if (env->cur_state->active_irq_id)
> + return "IRQ-disabled region";
> + if (env->cur_state->active_locks)
> + return "lock region";
This check didn't exist before, did it? It looks right, but adding
something into the commit message to explain this would be nice.
Does it mean we can currently call a sleepable helper in the lock
region?
> + return "non-sleepable context";
> +}
> +
> static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> int *insn_idx_p)
> {
> @@ -11634,28 +11647,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> return err;
> }
>
> - if (env->cur_state->active_rcu_locks) {
> - if (fn->might_sleep) {
> - verbose(env, "sleepable helper %s#%d in rcu_read_lock region\n",
> - func_id_name(func_id), func_id);
> - return -EINVAL;
> - }
> - }
> -
> - if (env->cur_state->active_preempt_locks) {
> - if (fn->might_sleep) {
> - verbose(env, "sleepable helper %s#%d in non-preemptible region\n",
> - func_id_name(func_id), func_id);
> - return -EINVAL;
> - }
> - }
> -
> - if (env->cur_state->active_irq_id) {
> - if (fn->might_sleep) {
> - verbose(env, "sleepable helper %s#%d in IRQ-disabled region\n",
> - func_id_name(func_id), func_id);
> - return -EINVAL;
> - }
> + if (fn->might_sleep && !in_sleepable_context(env)) {
> + verbose(env, "sleepable helper %s#%d in %s\n",
> + func_id_name(func_id), func_id,
> + non_sleepable_context_description(env));
> + return -EINVAL;
> }
>
> /* Track non-sleepable context for helpers. */
> --
> 2.47.3
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 2/6] bpf: consolidate sleepable context error message printing
2026-02-23 20:06 ` Mykyta Yatsenko
@ 2026-02-23 20:27 ` Eduard Zingerman
0 siblings, 0 replies; 24+ messages in thread
From: Eduard Zingerman @ 2026-02-23 20:27 UTC (permalink / raw)
To: Mykyta Yatsenko, Puranjay Mohan, bpf
Cc: Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
kernel-team
On Mon, 2026-02-23 at 20:06 +0000, Mykyta Yatsenko wrote:
> Puranjay Mohan <puranjay@kernel.org> writes:
>
> > check_helper_call() prints the error message for every
> > env->cur_state->active* element when calling a sleepable helper.
> > Consolidate all of them into a single print statement.
> >
> > Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> > ---
> > kernel/bpf/verifier.c | 40 ++++++++++++++++++----------------------
> > 1 file changed, 18 insertions(+), 22 deletions(-)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index c693dd663cab..c2a63f8c8984 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -11574,6 +11574,19 @@ static inline bool in_sleepable_context(struct bpf_verifier_env *env)
> > in_sleepable(env);
> > }
> >
> > +static const char *non_sleepable_context_description(struct bpf_verifier_env *env)
> > +{
> > + if (env->cur_state->active_rcu_locks)
> > + return "rcu_read_lock region";
> > + if (env->cur_state->active_preempt_locks)
> > + return "non-preemptible region";
> > + if (env->cur_state->active_irq_id)
> > + return "IRQ-disabled region";
> > + if (env->cur_state->active_locks)
> > + return "lock region";
> This check didn't exist before, did it? It looks right, but adding
> something into the commit message to explain this would be nice.
> Does it mean we can currently call a sleepable helper in the lock
> region?
This triggered me as well. Claude pointed here:
> The restriction is at verifier.c:20999–21007, in do_check() — before check_helper_call() is even reached:
> if (env->cur_state->active_locks) {
> if ((insn->src_reg == BPF_REG_0 &&
> insn->imm != BPF_FUNC_spin_unlock) ||
> (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
> (insn->off != 0 || !kfunc_spin_allowed(insn->imm)))) {
> verbose(env,
> "function calls are not allowed while holding a lock\n");
> return -EINVAL;
> }
> }
Commenting about it in the commit message would be helpful 100%.
> > + return "non-sleepable context";
> > +}
> > +
[...]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 1/6] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators
2026-02-23 17:46 ` [PATCH bpf-next v3 1/6] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators Puranjay Mohan
2026-02-23 19:59 ` Mykyta Yatsenko
@ 2026-02-23 20:41 ` Eduard Zingerman
1 sibling, 0 replies; 24+ messages in thread
From: Eduard Zingerman @ 2026-02-23 20:41 UTC (permalink / raw)
To: Puranjay Mohan, bpf
Cc: Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
Mykyta Yatsenko, kernel-team
On Mon, 2026-02-23 at 09:46 -0800, Puranjay Mohan wrote:
[...]
> @@ -1083,6 +1084,23 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
> return 0;
> }
>
> +/*
> + * Release the acquired reference tracked by iter_st->id, if any.
> + * Used during auto-release in _next, DRAINED handling, and _destroy.
> + */
> +static int iter_release_acquired_ref(struct bpf_verifier_env *env,
> + struct bpf_reg_state *iter_st)
> +{
> + int err;
> +
> + if (!iter_st->id)
> + return 0;
> + err = release_reference(env, iter_st->id);
> + if (!err)
> + iter_st->id = 0;
Nit: maybe drop the `!err` check?
It requires a few moments to process mentally and the verifier
should stop on this error anyway.
> + return err;
> +}
> +
> static int unmark_stack_slots_iter(struct bpf_verifier_env *env,
> struct bpf_reg_state *reg, int nr_slots)
> {
[...]
> @@ -9178,8 +9204,11 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
> /* mark current iter state as drained and assume returned NULL */
> cur_iter->iter.state = BPF_ITER_STATE_DRAINED;
> __mark_reg_const_zero(env, &cur_fr->regs[BPF_REG_0]);
> -
> - return 0;
> + /*
> + * If _next acquired a ref (KF_ACQUIRE), release it in the DRAINED branch since NULL
> + * was returned.
> + */
> + return iter_release_acquired_ref(env, cur_iter);
Nit: Check that `_next` is KF_ACQUIRE here?
Not strictly needed for this patch but is a bit more consistent.
E.g. consider some future iterator that acquires reference on creation
and not on `_next`.
> }
>
> static bool arg_type_is_mem_size(enum bpf_arg_type type)
> @@ -14197,6 +14226,17 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>
> if (meta.initialized_dynptr.ref_obj_id) {
> err = unmark_stack_slots_dynptr(env, reg);
> + } else if (base_type(reg->type) == PTR_TO_STACK) {
> + struct bpf_reg_state *iter_st;
> +
> + iter_st = get_iter_from_state(env->cur_state, &meta);
> + if (!iter_st->id) {
> + verbose(env, "no acquired reference to release\n");
> + return -EINVAL;
> + }
> + err = release_reference(env, iter_st->id);
> + if (!err)
> + iter_st->id = 0;
Nit: iter_release_acquired_ref()?
> } else {
> err = release_reference(env, reg->ref_obj_id);
> if (err)
[...]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 3/6] bpf: Add KF_FORBID_SLEEP modifier for KF_ACQUIRE kfuncs
2026-02-23 17:46 ` [PATCH bpf-next v3 3/6] bpf: Add KF_FORBID_SLEEP modifier for KF_ACQUIRE kfuncs Puranjay Mohan
@ 2026-02-23 22:14 ` Eduard Zingerman
2026-02-24 15:24 ` Puranjay Mohan
0 siblings, 1 reply; 24+ messages in thread
From: Eduard Zingerman @ 2026-02-23 22:14 UTC (permalink / raw)
To: Puranjay Mohan, bpf
Cc: Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
Mykyta Yatsenko, kernel-team
[-- Attachment #1: Type: text/plain, Size: 1808 bytes --]
On Mon, 2026-02-23 at 09:46 -0800, Puranjay Mohan wrote:
[...]
Would it be possible to move regular spin locks to similar semantics
(reference forbidding sleep)?
[...]
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 4872d2a6c42d..65327d2f19c1 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -8644,6 +8644,9 @@ static int btf_check_kfunc_protos(struct btf *btf, u32 func_id, u32 func_flags)
> return err;
> }
>
> + if ((func_flags & KF_FORBID_SLEEP) && !(func_flags & KF_ACQUIRE))
> + return -EINVAL;
> +
Are you sure this check is necessary?
> return 0;
> }
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index c2a63f8c8984..a90c3d1b2d72 100644
[...]
> @@ -14214,6 +14228,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> return -EACCES;
> }
>
> + if (sleepable && env->cur_state->forbid_sleep_count) {
> + verbose(env, "sleepable kfunc %s in nosleep region\n", func_name);
> + return -EACCES;
> + }
> +
Instead of adding this hunk please do the following:
- add a refactoring patch as in the attachment;
- remove the following check from do_check_insn():
> if (env->cur_state->active_locks) {
> if ((insn->src_reg == BPF_REG_0 &&
> insn->imm != BPF_FUNC_spin_unlock) ||
> (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
> (insn->off != 0 || !kfunc_spin_allowed(insn->imm)))) {
> verbose(env,
> "function calls are not allowed while holding a lock\n");
> return -EINVAL;
> }
> }
Relying instead on `sleepable && !in_sleepable_context()` checks in
check_kfunc_call() and check_helper_call(). (As one more refactoring patch).
[...]
[-- Attachment #2: check-kfunc-call.diff --]
[-- Type: text/x-patch, Size: 1927 bytes --]
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2ef00f9b94fe..4a5d742fe4a4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -14153,34 +14166,22 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
}
}));
}
- } else if (sleepable && env->cur_state->active_rcu_locks) {
- verbose(env, "kernel func %s is sleepable within rcu_read_lock region\n", func_name);
- return -EACCES;
- }
-
- if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
- verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
- return -EACCES;
- }
-
- if (env->cur_state->active_preempt_locks) {
- if (preempt_disable) {
- env->cur_state->active_preempt_locks++;
- } else if (preempt_enable) {
- env->cur_state->active_preempt_locks--;
- } else if (sleepable) {
- verbose(env, "kernel func %s is sleepable within non-preemptible region\n", func_name);
- return -EACCES;
- }
} else if (preempt_disable) {
env->cur_state->active_preempt_locks++;
} else if (preempt_enable) {
- verbose(env, "unmatched attempt to enable preemption (kernel function %s)\n", func_name);
- return -EINVAL;
+ if (env->cur_state->active_preempt_locks == 0) {
+ verbose(env, "unmatched attempt to enable preemption (kernel function %s)\n", func_name);
+ return -EINVAL;
+ }
+ env->cur_state->active_preempt_locks--;
+ } else if (sleepable && !in_sleepable_context(env)) {
+ verbose(env, "kernel func %s is sleepable within %s\n",
+ func_name, non_sleepable_context_description(env));
+ return -EACCES;
}
- if (env->cur_state->active_irq_id && sleepable) {
- verbose(env, "kernel func %s is sleepable within IRQ-disabled region\n", func_name);
+ if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
+ verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
return -EACCES;
}
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs
2026-02-23 17:46 [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs Puranjay Mohan
` (5 preceding siblings ...)
2026-02-23 17:46 ` [PATCH bpf-next v3 6/6] selftests/bpf: Add tests for split " Puranjay Mohan
@ 2026-02-24 1:49 ` Alexei Starovoitov
2026-02-24 11:24 ` Puranjay Mohan
6 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2026-02-24 1:49 UTC (permalink / raw)
To: Puranjay Mohan
Cc: bpf, Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Kernel Team
On Mon, Feb 23, 2026 at 9:48 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>
>
> Iterator KF_ACQUIRE support is not useful on its own right now, but it
> becomes the foundation for KF_FORBID_SLEEP: an iterator whose _next is
> annotated with KF_ACQUIRE | KF_FORBID_SLEEP can now express "holding this
> pointer forbids sleeping; calling _release invalidates the pointer and
> re-enables sleeping."
>
The name of the flag and semantics bothers me a bit.
How about we call it KF_ACQUIRE_LOCK and make it exclusive vs KF_ACQUIRE.
And document it like:
KF_ACQUIRE -> acquires a resource via reference counting
KF_ACQUIRE_LOCK -> acquires a resource by locking it
and to match it I would do:
KF_RELEASE -> releases a resource by decrementing refcount
KF_RELEASE_LOCK -> releases a resource by unlocking it.
When lock is taken it's typically prohibited to sleep and do
other things. So I feel such flags would describe what's
going on underneath more accurately.
Thoughts?
btw do we need a selftest that 2nd task_iter cannot call iter_next()?
Like:
iter_new(it1);
iter_new(it2);
iter_next(it1);
iter_next(it2); // this one has nothing to do with sleeping
but once we think of iter_next as KF_ACQUIRE_LOCK
it becomes obvious that it's not ok to re-acquire the same lock.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs
2026-02-24 1:49 ` [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs Alexei Starovoitov
@ 2026-02-24 11:24 ` Puranjay Mohan
2026-02-24 18:00 ` Alexei Starovoitov
0 siblings, 1 reply; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-24 11:24 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Mykyta Yatsenko, Kernel Team
On Tue, Feb 24, 2026 at 1:49 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Feb 23, 2026 at 9:48 AM Puranjay Mohan <puranjay@kernel.org> wrote:
> >
> >
> > Iterator KF_ACQUIRE support is not useful on its own right now, but it
> > becomes the foundation for KF_FORBID_SLEEP: an iterator whose _next is
> > annotated with KF_ACQUIRE | KF_FORBID_SLEEP can now express "holding this
> > pointer forbids sleeping; calling _release invalidates the pointer and
> > re-enables sleeping."
> >
>
> The name of the flag and semantics bothers me a bit.
> How about we call it KF_ACQUIRE_LOCK and make it exclusive vs KF_ACQUIRE.
> And document it like:
> KF_ACQUIRE -> acquires a resource via reference counting
> KF_ACQUIRE_LOCK -> acquires a resource by locking it
>
> and to match it I would do:
> KF_RELEASE -> releases a resource by decrementing refcount
> KF_RELEASE_LOCK -> releases a resource by unlocking it.
>
> When lock is taken it's typically prohibited to sleep and do
> other things. So I feel such flags would describe what's
> going on underneath more accurately.
>
> Thoughts?
I am fine with this except, I think we should still keep the
KF_RELEASE a single release mechanism that releases according to how
the reference was taken (refcount/lock). There is nothing wrong with
separate KF_RELEASE and KF_RELEASE_LOCK, except we would have to
reject all wrong usages like KF_ACQUIRE -> KF_RELEASE_LOCK,
KF_ACQUIRE_LOCK -> KF_RELEASE. I am fine with both but prefer the
common KF_RELEASE. Let me know which one you like and I will go ahead
with it.
> btw do we need a selftest that 2nd task_iter cannot call iter_next()?
> Like:
> iter_new(it1);
> iter_new(it2);
> iter_next(it1);
> iter_next(it2); // this one has nothing to do with sleeping
> but once we think of iter_next as KF_ACQUIRE_LOCK
> it becomes obvious that it's not ok to re-acquire the same lock.
But we still need to allow this if the locks are different, because
that will support the nested iterator use case. For example, nested
task_vma iterators that iterate vmas for different task structs
(different mm->mmap_lock). And I think the check for re-acquiring the
lock for the same iterator should be done at runtime as it is done
currently in task vma iterator. I think what you are saying is that we
could have a certain class of iterators that take a global lock ->
only one iterator of that type can be used at a time (nesting not
supported by design), even that should be taken care at runtime rather
than the verifier tracking it.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 3/6] bpf: Add KF_FORBID_SLEEP modifier for KF_ACQUIRE kfuncs
2026-02-23 22:14 ` Eduard Zingerman
@ 2026-02-24 15:24 ` Puranjay Mohan
2026-02-24 18:17 ` Eduard Zingerman
0 siblings, 1 reply; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-24 15:24 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
kernel-team
On Mon, Feb 23, 2026 at 10:15 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2026-02-23 at 09:46 -0800, Puranjay Mohan wrote:
>
> [...]
>
> Would it be possible to move regular spin locks to similar semantics
> (reference forbidding sleep)?
>
> [...]
>
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 4872d2a6c42d..65327d2f19c1 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -8644,6 +8644,9 @@ static int btf_check_kfunc_protos(struct btf *btf, u32 func_id, u32 func_flags)
> > return err;
> > }
> >
> > + if ((func_flags & KF_FORBID_SLEEP) && !(func_flags & KF_ACQUIRE))
> > + return -EINVAL;
> > +
>
> Are you sure this check is necessary?
>
> > return 0;
> > }
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index c2a63f8c8984..a90c3d1b2d72 100644
>
> [...]
>
> > @@ -14214,6 +14228,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> > return -EACCES;
> > }
> >
> > + if (sleepable && env->cur_state->forbid_sleep_count) {
> > + verbose(env, "sleepable kfunc %s in nosleep region\n", func_name);
> > + return -EACCES;
> > + }
> > +
>
> Instead of adding this hunk please do the following:
> - add a refactoring patch as in the attachment;
> - remove the following check from do_check_insn():
>
> > if (env->cur_state->active_locks) {
> > if ((insn->src_reg == BPF_REG_0 &&
> > insn->imm != BPF_FUNC_spin_unlock) ||
> > (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
> > (insn->off != 0 || !kfunc_spin_allowed(insn->imm)))) {
> > verbose(env,
> > "function calls are not allowed while holding a lock\n");
> > return -EINVAL;
> > }
> > }
>
> Relying instead on `sleepable && !in_sleepable_context()` checks in
> check_kfunc_call() and check_helper_call(). (As one more refactoring patch).
But removing the above hunk has other effects, this check is not just
about sleepable but all function calls under bpf_spin_lock. Do we want
to relax this now?
>
> [...]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs
2026-02-24 11:24 ` Puranjay Mohan
@ 2026-02-24 18:00 ` Alexei Starovoitov
2026-02-24 18:55 ` Eduard Zingerman
0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2026-02-24 18:00 UTC (permalink / raw)
To: Puranjay Mohan
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Mykyta Yatsenko, Kernel Team
On Tue, Feb 24, 2026 at 3:24 AM Puranjay Mohan <puranjay12@gmail.com> wrote:
>
> On Tue, Feb 24, 2026 at 1:49 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Feb 23, 2026 at 9:48 AM Puranjay Mohan <puranjay@kernel.org> wrote:
> > >
> > >
> > > Iterator KF_ACQUIRE support is not useful on its own right now, but it
> > > becomes the foundation for KF_FORBID_SLEEP: an iterator whose _next is
> > > annotated with KF_ACQUIRE | KF_FORBID_SLEEP can now express "holding this
> > > pointer forbids sleeping; calling _release invalidates the pointer and
> > > re-enables sleeping."
> > >
> >
> > The name of the flag and semantics bothers me a bit.
> > How about we call it KF_ACQUIRE_LOCK and make it exclusive vs KF_ACQUIRE.
> > And document it like:
> > KF_ACQUIRE -> acquires a resource via reference counting
> > KF_ACQUIRE_LOCK -> acquires a resource by locking it
> >
> > and to match it I would do:
> > KF_RELEASE -> releases a resource by decrementing refcount
> > KF_RELEASE_LOCK -> releases a resource by unlocking it.
> >
> > When lock is taken it's typically prohibited to sleep and do
> > other things. So I feel such flags would describe what's
> > going on underneath more accurately.
> >
> > Thoughts?
>
> I am fine with this except, I think we should still keep the
> KF_RELEASE a single release mechanism that releases according to how
> the reference was taken (refcount/lock). There is nothing wrong with
> separate KF_RELEASE and KF_RELEASE_LOCK, except we would have to
> reject all wrong usages like KF_ACQUIRE -> KF_RELEASE_LOCK,
> KF_ACQUIRE_LOCK -> KF_RELEASE.
yes. Is this too complex or what's the concern?
> I am fine with both but prefer the
> common KF_RELEASE. Let me know which one you like and I will go ahead
> with it.
Magic KF_RELEASE simplifies things a bit. I don't mind, I guess,
but would like to hear a 3rd opinion.
> > btw do we need a selftest that 2nd task_iter cannot call iter_next()?
> > Like:
> > iter_new(it1);
> > iter_new(it2);
> > iter_next(it1);
> > iter_next(it2); // this one has nothing to do with sleeping
> > but once we think of iter_next as KF_ACQUIRE_LOCK
> > it becomes obvious that it's not ok to re-acquire the same lock.
>
> But we still need to allow this if the locks are different, because
> that will support the nested iterator use case. For example, nested
> task_vma iterators that iterate vmas for different task structs
> (different mm->mmap_lock).
Well, different tasks can have the same mm->mmap_lock.
For example: two threads of the same process.
> And I think the check for re-acquiring the
> lock for the same iterator should be done at runtime as it is done
> currently in task vma iterator. I think what you are saying is that we
> could have a certain class of iterators that take a global lock ->
> only one iterator of that type can be used at a time (nesting not
> supported by design), even that should be taken care at runtime rather
> than the verifier tracking it.
I mean that as long as iter_new() before and iter_next() after
implementation is using mmap_read_trylock() then it will be fine.
Runtime detection of deadlock works, but we need a selftest.
When iter_new() did trylock() it was borderline acceptable,
but doing it in iter_next() feels suboptimal.
Especially if bpf progs are going to adopt the pattern of
bpf_iter_task_vma_release() that does mmap_unlock
in the loop body, so next iter_next() will rely on luck.
Many vma-s to iterate -> high chance that one of trylocks will fail.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 3/6] bpf: Add KF_FORBID_SLEEP modifier for KF_ACQUIRE kfuncs
2026-02-24 15:24 ` Puranjay Mohan
@ 2026-02-24 18:17 ` Eduard Zingerman
2026-02-24 19:41 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 24+ messages in thread
From: Eduard Zingerman @ 2026-02-24 18:17 UTC (permalink / raw)
To: Puranjay Mohan
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
kernel-team
On Tue, 2026-02-24 at 15:24 +0000, Puranjay Mohan wrote:
[...]
> > - remove the following check from do_check_insn():
> >
> > > if (env->cur_state->active_locks) {
> > > if ((insn->src_reg == BPF_REG_0 &&
> > > insn->imm != BPF_FUNC_spin_unlock) ||
> > > (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
> > > (insn->off != 0 || !kfunc_spin_allowed(insn->imm)))) {
> > > verbose(env,
> > > "function calls are not allowed while holding a lock\n");
> > > return -EINVAL;
> > > }
> > > }
> >
> > Relying instead on `sleepable && !in_sleepable_context()` checks in
> > check_kfunc_call() and check_helper_call(). (As one more refactoring patch).
>
> But removing the above hunk has other effects, this check is not just
> about sleepable but all function calls under bpf_spin_lock. Do we want
> to relax this now?
Discussed this off-list.
The side-effect of this change is that calls to any non-sleepable
helpers / kfuncs would be allowed while holding a spin lock.
This should be fine, but probably belongs to a separate series.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs
2026-02-24 18:00 ` Alexei Starovoitov
@ 2026-02-24 18:55 ` Eduard Zingerman
2026-02-24 19:32 ` Alexei Starovoitov
2026-02-24 19:51 ` Puranjay Mohan
0 siblings, 2 replies; 24+ messages in thread
From: Eduard Zingerman @ 2026-02-24 18:55 UTC (permalink / raw)
To: Alexei Starovoitov, Puranjay Mohan
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, Mykyta Yatsenko,
Kernel Team
On Tue, 2026-02-24 at 10:00 -0800, Alexei Starovoitov wrote:
> On Tue, Feb 24, 2026 at 3:24 AM Puranjay Mohan <puranjay12@gmail.com> wrote:
> >
> > On Tue, Feb 24, 2026 at 1:49 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Feb 23, 2026 at 9:48 AM Puranjay Mohan <puranjay@kernel.org> wrote:
> > > >
> > > >
> > > > Iterator KF_ACQUIRE support is not useful on its own right now, but it
> > > > becomes the foundation for KF_FORBID_SLEEP: an iterator whose _next is
> > > > annotated with KF_ACQUIRE | KF_FORBID_SLEEP can now express "holding this
> > > > pointer forbids sleeping; calling _release invalidates the pointer and
> > > > re-enables sleeping."
> > > >
> > >
> > > The name of the flag and semantics bothers me a bit.
> > > How about we call it KF_ACQUIRE_LOCK and make it exclusive vs KF_ACQUIRE.
> > > And document it like:
> > > KF_ACQUIRE -> acquires a resource via reference counting
> > > KF_ACQUIRE_LOCK -> acquires a resource by locking it
> > >
> > > and to match it I would do:
> > > KF_RELEASE -> releases a resource by decrementing refcount
> > > KF_RELEASE_LOCK -> releases a resource by unlocking it.
> > >
> > > When lock is taken it's typically prohibited to sleep and do
> > > other things. So I feel such flags would describe what's
> > > going on underneath more accurately.
> > >
> > > Thoughts?
> >
> > I am fine with this except, I think we should still keep the
> > KF_RELEASE a single release mechanism that releases according to how
> > the reference was taken (refcount/lock). There is nothing wrong with
> > separate KF_RELEASE and KF_RELEASE_LOCK, except we would have to
> > reject all wrong usages like KF_ACQUIRE -> KF_RELEASE_LOCK,
> > KF_ACQUIRE_LOCK -> KF_RELEASE.
>
> yes. Is this too complex or what's the concern?
>
> > I am fine with both but prefer the
> > common KF_RELEASE. Let me know which one you like and I will go ahead
> > with it.
>
> Magic KF_RELEASE simplifies things a bit. I don't mind, I guess,
> but would like to hear a 3rd opinion.
At the moment verifier tracks the following locking/non-sleepable
resources:
- active_irq_id - nesting allowed, forbids sleep, special stack slot acquire/release logic.
- active_rcu_locks - nesting allowed, forbids sleep, used by in in_rcu_cs(),
no special acquire/release logic.
- active_preempt_locks - nesting allowed, forbids sleep, no special acquire/release logic.
- active_locks - nesting disallowed, forbids sleep, no special acquire/release logic.
Currently verifier checks what kfunc/helper is called and changes
internal state accordingly. It might be possible to slice the above
into a set of "generic" flags / feats, but it requires some thought.
Maybe explore this as a separate series?
[...]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs
2026-02-24 18:55 ` Eduard Zingerman
@ 2026-02-24 19:32 ` Alexei Starovoitov
2026-02-24 19:47 ` Puranjay Mohan
2026-02-24 19:51 ` Puranjay Mohan
1 sibling, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2026-02-24 19:32 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Puranjay Mohan, bpf, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
Mykyta Yatsenko, Kernel Team
On Tue, Feb 24, 2026 at 10:55 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2026-02-24 at 10:00 -0800, Alexei Starovoitov wrote:
> > On Tue, Feb 24, 2026 at 3:24 AM Puranjay Mohan <puranjay12@gmail.com> wrote:
> > >
> > > On Tue, Feb 24, 2026 at 1:49 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Mon, Feb 23, 2026 at 9:48 AM Puranjay Mohan <puranjay@kernel.org> wrote:
> > > > >
> > > > >
> > > > > Iterator KF_ACQUIRE support is not useful on its own right now, but it
> > > > > becomes the foundation for KF_FORBID_SLEEP: an iterator whose _next is
> > > > > annotated with KF_ACQUIRE | KF_FORBID_SLEEP can now express "holding this
> > > > > pointer forbids sleeping; calling _release invalidates the pointer and
> > > > > re-enables sleeping."
> > > > >
> > > >
> > > > The name of the flag and semantics bothers me a bit.
> > > > How about we call it KF_ACQUIRE_LOCK and make it exclusive vs KF_ACQUIRE.
> > > > And document it like:
> > > > KF_ACQUIRE -> acquires a resource via reference counting
> > > > KF_ACQUIRE_LOCK -> acquires a resource by locking it
> > > >
> > > > and to match it I would do:
> > > > KF_RELEASE -> releases a resource by decrementing refcount
> > > > KF_RELEASE_LOCK -> releases a resource by unlocking it.
> > > >
> > > > When lock is taken it's typically prohibited to sleep and do
> > > > other things. So I feel such flags would describe what's
> > > > going on underneath more accurately.
> > > >
> > > > Thoughts?
> > >
> > > I am fine with this except, I think we should still keep the
> > > KF_RELEASE a single release mechanism that releases according to how
> > > the reference was taken (refcount/lock). There is nothing wrong with
> > > separate KF_RELEASE and KF_RELEASE_LOCK, except we would have to
> > > reject all wrong usages like KF_ACQUIRE -> KF_RELEASE_LOCK,
> > > KF_ACQUIRE_LOCK -> KF_RELEASE.
> >
> > yes. Is this too complex or what's the concern?
> >
> > > I am fine with both but prefer the
> > > common KF_RELEASE. Let me know which one you like and I will go ahead
> > > with it.
> >
> > Magic KF_RELEASE simplifies things a bit. I don't mind, I guess,
> > but would like to hear a 3rd opinion.
>
> At the moment verifier tracks the following locking/non-sleepable
> resources:
> - active_irq_id - nesting allowed, forbids sleep, special stack slot acquire/release logic.
> - active_rcu_locks - nesting allowed, forbids sleep, used by in in_rcu_cs(),
> no special acquire/release logic.
> - active_preempt_locks - nesting allowed, forbids sleep, no special acquire/release logic.
> - active_locks - nesting disallowed, forbids sleep, no special acquire/release logic.
>
> Currently verifier checks what kfunc/helper is called and changes
> internal state accordingly. It might be possible to slice the above
> into a set of "generic" flags / feats, but it requires some thought.
>
> Maybe explore this as a separate series?
That's exactly what I was asking earlier.
Yet another active_* counter adds more tech debt that we would
need to clean up later.
Adding a new one is easy. It fits. It's not obvious how
to generalize them. One needs to think hard to come up with a solution.
I'm ok to defer the work to later, but at some point
we'd have to make a hard stand that another active_* is no go.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 3/6] bpf: Add KF_FORBID_SLEEP modifier for KF_ACQUIRE kfuncs
2026-02-24 18:17 ` Eduard Zingerman
@ 2026-02-24 19:41 ` Kumar Kartikeya Dwivedi
0 siblings, 0 replies; 24+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-02-24 19:41 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Puranjay Mohan, bpf, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Mykyta Yatsenko, kernel-team
On Tue, 24 Feb 2026 at 19:17, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2026-02-24 at 15:24 +0000, Puranjay Mohan wrote:
>
> [...]
>
> > > - remove the following check from do_check_insn():
> > >
> > > > if (env->cur_state->active_locks) {
> > > > if ((insn->src_reg == BPF_REG_0 &&
> > > > insn->imm != BPF_FUNC_spin_unlock) ||
> > > > (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
> > > > (insn->off != 0 || !kfunc_spin_allowed(insn->imm)))) {
> > > > verbose(env,
> > > > "function calls are not allowed while holding a lock\n");
> > > > return -EINVAL;
> > > > }
> > > > }
> > >
> > > Relying instead on `sleepable && !in_sleepable_context()` checks in
> > > check_kfunc_call() and check_helper_call(). (As one more refactoring patch).
> >
> > But removing the above hunk has other effects, this check is not just
> > about sleepable but all function calls under bpf_spin_lock. Do we want
> > to relax this now?
>
> Discussed this off-list.
> The side-effect of this change is that calls to any non-sleepable
> helpers / kfuncs would be allowed while holding a spin lock.
> This should be fine, but probably belongs to a separate series.
I think there are some problems with opening this up, e.g. we can
deadlock if we do bpf_map_update_elem that takes the same bpf spin
lock as the map value.
Definitely shouldn't be done as part of this set.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs
2026-02-24 19:32 ` Alexei Starovoitov
@ 2026-02-24 19:47 ` Puranjay Mohan
0 siblings, 0 replies; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-24 19:47 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Eduard Zingerman, bpf, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
Mykyta Yatsenko, Kernel Team
On Tue, Feb 24, 2026 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Feb 24, 2026 at 10:55 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Tue, 2026-02-24 at 10:00 -0800, Alexei Starovoitov wrote:
> > > On Tue, Feb 24, 2026 at 3:24 AM Puranjay Mohan <puranjay12@gmail.com> wrote:
> > > >
> > > > On Tue, Feb 24, 2026 at 1:49 AM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Mon, Feb 23, 2026 at 9:48 AM Puranjay Mohan <puranjay@kernel.org> wrote:
> > > > > >
> > > > > >
> > > > > > Iterator KF_ACQUIRE support is not useful on its own right now, but it
> > > > > > becomes the foundation for KF_FORBID_SLEEP: an iterator whose _next is
> > > > > > annotated with KF_ACQUIRE | KF_FORBID_SLEEP can now express "holding this
> > > > > > pointer forbids sleeping; calling _release invalidates the pointer and
> > > > > > re-enables sleeping."
> > > > > >
> > > > >
> > > > > The name of the flag and semantics bothers me a bit.
> > > > > How about we call it KF_ACQUIRE_LOCK and make it exclusive vs KF_ACQUIRE.
> > > > > And document it like:
> > > > > KF_ACQUIRE -> acquires a resource via reference counting
> > > > > KF_ACQUIRE_LOCK -> acquires a resource by locking it
> > > > >
> > > > > and to match it I would do:
> > > > > KF_RELEASE -> releases a resource by decrementing refcount
> > > > > KF_RELEASE_LOCK -> releases a resource by unlocking it.
> > > > >
> > > > > When lock is taken it's typically prohibited to sleep and do
> > > > > other things. So I feel such flags would describe what's
> > > > > going on underneath more accurately.
> > > > >
> > > > > Thoughts?
> > > >
> > > > I am fine with this except, I think we should still keep the
> > > > KF_RELEASE a single release mechanism that releases according to how
> > > > the reference was taken (refcount/lock). There is nothing wrong with
> > > > separate KF_RELEASE and KF_RELEASE_LOCK, except we would have to
> > > > reject all wrong usages like KF_ACQUIRE -> KF_RELEASE_LOCK,
> > > > KF_ACQUIRE_LOCK -> KF_RELEASE.
> > >
> > > yes. Is this too complex or what's the concern?
> > >
> > > > I am fine with both but prefer the
> > > > common KF_RELEASE. Let me know which one you like and I will go ahead
> > > > with it.
> > >
> > > Magic KF_RELEASE simplifies things a bit. I don't mind, I guess,
> > > but would like to hear a 3rd opinion.
> >
> > At the moment verifier tracks the following locking/non-sleepable
> > resources:
> > - active_irq_id - nesting allowed, forbids sleep, special stack slot acquire/release logic.
> > - active_rcu_locks - nesting allowed, forbids sleep, used by in in_rcu_cs(),
> > no special acquire/release logic.
> > - active_preempt_locks - nesting allowed, forbids sleep, no special acquire/release logic.
> > - active_locks - nesting disallowed, forbids sleep, no special acquire/release logic.
> >
> > Currently verifier checks what kfunc/helper is called and changes
> > internal state accordingly. It might be possible to slice the above
> > into a set of "generic" flags / feats, but it requires some thought.
> >
> > Maybe explore this as a separate series?
>
> That's exactly what I was asking earlier.
> Yet another active_* counter adds more tech debt that we would
> need to clean up later.
> Adding a new one is easy. It fits. It's not obvious how
> to generalize them. One needs to think hard to come up with a solution.
> I'm ok to defer the work to later, but at some point
> we'd have to make a hard stand that another active_* is no go.
I will do that next after this work lands.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs
2026-02-24 18:55 ` Eduard Zingerman
2026-02-24 19:32 ` Alexei Starovoitov
@ 2026-02-24 19:51 ` Puranjay Mohan
2026-02-24 20:15 ` Eduard Zingerman
1 sibling, 1 reply; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-24 19:51 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
Mykyta Yatsenko, Kernel Team
On Tue, Feb 24, 2026 at 6:55 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2026-02-24 at 10:00 -0800, Alexei Starovoitov wrote:
> > On Tue, Feb 24, 2026 at 3:24 AM Puranjay Mohan <puranjay12@gmail.com> wrote:
> > >
> > > On Tue, Feb 24, 2026 at 1:49 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Mon, Feb 23, 2026 at 9:48 AM Puranjay Mohan <puranjay@kernel.org> wrote:
> > > > >
> > > > >
> > > > > Iterator KF_ACQUIRE support is not useful on its own right now, but it
> > > > > becomes the foundation for KF_FORBID_SLEEP: an iterator whose _next is
> > > > > annotated with KF_ACQUIRE | KF_FORBID_SLEEP can now express "holding this
> > > > > pointer forbids sleeping; calling _release invalidates the pointer and
> > > > > re-enables sleeping."
> > > > >
> > > >
> > > > The name of the flag and semantics bothers me a bit.
> > > > How about we call it KF_ACQUIRE_LOCK and make it exclusive vs KF_ACQUIRE.
> > > > And document it like:
> > > > KF_ACQUIRE -> acquires a resource via reference counting
> > > > KF_ACQUIRE_LOCK -> acquires a resource by locking it
> > > >
> > > > and to match it I would do:
> > > > KF_RELEASE -> releases a resource by decrementing refcount
> > > > KF_RELEASE_LOCK -> releases a resource by unlocking it.
> > > >
> > > > When lock is taken it's typically prohibited to sleep and do
> > > > other things. So I feel such flags would describe what's
> > > > going on underneath more accurately.
> > > >
> > > > Thoughts?
> > >
> > > I am fine with this except, I think we should still keep the
> > > KF_RELEASE a single release mechanism that releases according to how
> > > the reference was taken (refcount/lock). There is nothing wrong with
> > > separate KF_RELEASE and KF_RELEASE_LOCK, except we would have to
> > > reject all wrong usages like KF_ACQUIRE -> KF_RELEASE_LOCK,
> > > KF_ACQUIRE_LOCK -> KF_RELEASE.
> >
> > yes. Is this too complex or what's the concern?
> >
> > > I am fine with both but prefer the
> > > common KF_RELEASE. Let me know which one you like and I will go ahead
> > > with it.
> >
> > Magic KF_RELEASE simplifies things a bit. I don't mind, I guess,
> > but would like to hear a 3rd opinion.
>
> At the moment verifier tracks the following locking/non-sleepable
> resources:
> - active_irq_id - nesting allowed, forbids sleep, special stack slot acquire/release logic.
> - active_rcu_locks - nesting allowed, forbids sleep, used by in in_rcu_cs(),
> no special acquire/release logic.
> - active_preempt_locks - nesting allowed, forbids sleep, no special acquire/release logic.
> - active_locks - nesting disallowed, forbids sleep, no special acquire/release logic.
>
> Currently verifier checks what kfunc/helper is called and changes
> internal state accordingly. It might be possible to slice the above
> into a set of "generic" flags / feats, but it requires some thought.
>
> Maybe explore this as a separate series?
Eduard,
Are you fine with making KF_ACQUIRE | KF_FORBIDS_SLEEP as a new flag
called KF_ACQUIRE_LOCK
and then keeping the common KF_RELEASE that will release according to
how it was acquired?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs
2026-02-24 19:51 ` Puranjay Mohan
@ 2026-02-24 20:15 ` Eduard Zingerman
2026-02-24 20:20 ` Puranjay Mohan
0 siblings, 1 reply; 24+ messages in thread
From: Eduard Zingerman @ 2026-02-24 20:15 UTC (permalink / raw)
To: Puranjay Mohan
Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
Mykyta Yatsenko, Kernel Team
On Tue, 2026-02-24 at 19:51 +0000, Puranjay Mohan wrote:
> On Tue, Feb 24, 2026 at 6:55 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Tue, 2026-02-24 at 10:00 -0800, Alexei Starovoitov wrote:
> > > On Tue, Feb 24, 2026 at 3:24 AM Puranjay Mohan <puranjay12@gmail.com> wrote:
> > > >
> > > > On Tue, Feb 24, 2026 at 1:49 AM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Mon, Feb 23, 2026 at 9:48 AM Puranjay Mohan <puranjay@kernel.org> wrote:
> > > > > >
> > > > > >
> > > > > > Iterator KF_ACQUIRE support is not useful on its own right now, but it
> > > > > > becomes the foundation for KF_FORBID_SLEEP: an iterator whose _next is
> > > > > > annotated with KF_ACQUIRE | KF_FORBID_SLEEP can now express "holding this
> > > > > > pointer forbids sleeping; calling _release invalidates the pointer and
> > > > > > re-enables sleeping."
> > > > > >
> > > > >
> > > > > The name of the flag and semantics bothers me a bit.
> > > > > How about we call it KF_ACQUIRE_LOCK and make it exclusive vs KF_ACQUIRE.
> > > > > And document it like:
> > > > > KF_ACQUIRE -> acquires a resource via reference counting
> > > > > KF_ACQUIRE_LOCK -> acquires a resource by locking it
> > > > >
> > > > > and to match it I would do:
> > > > > KF_RELEASE -> releases a resource by decrementing refcount
> > > > > KF_RELEASE_LOCK -> releases a resource by unlocking it.
> > > > >
> > > > > When lock is taken it's typically prohibited to sleep and do
> > > > > other things. So I feel such flags would describe what's
> > > > > going on underneath more accurately.
> > > > >
> > > > > Thoughts?
> > > >
> > > > I am fine with this except, I think we should still keep the
> > > > KF_RELEASE a single release mechanism that releases according to how
> > > > the reference was taken (refcount/lock). There is nothing wrong with
> > > > separate KF_RELEASE and KF_RELEASE_LOCK, except we would have to
> > > > reject all wrong usages like KF_ACQUIRE -> KF_RELEASE_LOCK,
> > > > KF_ACQUIRE_LOCK -> KF_RELEASE.
> > >
> > > yes. Is this too complex or what's the concern?
> > >
> > > > I am fine with both but prefer the
> > > > common KF_RELEASE. Let me know which one you like and I will go ahead
> > > > with it.
> > >
> > > Magic KF_RELEASE simplifies things a bit. I don't mind, I guess,
> > > but would like to hear a 3rd opinion.
> >
> > At the moment verifier tracks the following locking/non-sleepable
> > resources:
> > - active_irq_id - nesting allowed, forbids sleep, special stack slot acquire/release logic.
> > - active_rcu_locks - nesting allowed, forbids sleep, used by in in_rcu_cs(),
> > no special acquire/release logic.
> > - active_preempt_locks - nesting allowed, forbids sleep, no special acquire/release logic.
> > - active_locks - nesting disallowed, forbids sleep, no special acquire/release logic.
> >
> > Currently verifier checks what kfunc/helper is called and changes
> > internal state accordingly. It might be possible to slice the above
> > into a set of "generic" flags / feats, but it requires some thought.
> >
> > Maybe explore this as a separate series?
>
> Eduard,
>
> Are you fine with making KF_ACQUIRE | KF_FORBIDS_SLEEP as a new flag
> called KF_ACQUIRE_LOCK
> and then keeping the common KF_RELEASE that will release according to
> how it was acquired?
Idk, I'd keep it KF_FORBIDS_SLEEP for now and rename later.
W/o the discussed refactoring ACQUIRE_LOCK is a bit confusing, imo.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs
2026-02-24 20:15 ` Eduard Zingerman
@ 2026-02-24 20:20 ` Puranjay Mohan
0 siblings, 0 replies; 24+ messages in thread
From: Puranjay Mohan @ 2026-02-24 20:20 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
Mykyta Yatsenko, Kernel Team
On Tue, Feb 24, 2026 at 8:15 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2026-02-24 at 19:51 +0000, Puranjay Mohan wrote:
> > On Tue, Feb 24, 2026 at 6:55 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Tue, 2026-02-24 at 10:00 -0800, Alexei Starovoitov wrote:
> > > > On Tue, Feb 24, 2026 at 3:24 AM Puranjay Mohan <puranjay12@gmail.com> wrote:
> > > > >
> > > > > On Tue, Feb 24, 2026 at 1:49 AM Alexei Starovoitov
> > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Feb 23, 2026 at 9:48 AM Puranjay Mohan <puranjay@kernel.org> wrote:
> > > > > > >
> > > > > > >
> > > > > > > Iterator KF_ACQUIRE support is not useful on its own right now, but it
> > > > > > > becomes the foundation for KF_FORBID_SLEEP: an iterator whose _next is
> > > > > > > annotated with KF_ACQUIRE | KF_FORBID_SLEEP can now express "holding this
> > > > > > > pointer forbids sleeping; calling _release invalidates the pointer and
> > > > > > > re-enables sleeping."
> > > > > > >
> > > > > >
> > > > > > The name of the flag and semantics bothers me a bit.
> > > > > > How about we call it KF_ACQUIRE_LOCK and make it exclusive vs KF_ACQUIRE.
> > > > > > And document it like:
> > > > > > KF_ACQUIRE -> acquires a resource via reference counting
> > > > > > KF_ACQUIRE_LOCK -> acquires a resource by locking it
> > > > > >
> > > > > > and to match it I would do:
> > > > > > KF_RELEASE -> releases a resource by decrementing refcount
> > > > > > KF_RELEASE_LOCK -> releases a resource by unlocking it.
> > > > > >
> > > > > > When lock is taken it's typically prohibited to sleep and do
> > > > > > other things. So I feel such flags would describe what's
> > > > > > going on underneath more accurately.
> > > > > >
> > > > > > Thoughts?
> > > > >
> > > > > I am fine with this except, I think we should still keep the
> > > > > KF_RELEASE a single release mechanism that releases according to how
> > > > > the reference was taken (refcount/lock). There is nothing wrong with
> > > > > separate KF_RELEASE and KF_RELEASE_LOCK, except we would have to
> > > > > reject all wrong usages like KF_ACQUIRE -> KF_RELEASE_LOCK,
> > > > > KF_ACQUIRE_LOCK -> KF_RELEASE.
> > > >
> > > > yes. Is this too complex or what's the concern?
> > > >
> > > > > I am fine with both but prefer the
> > > > > common KF_RELEASE. Let me know which one you like and I will go ahead
> > > > > with it.
> > > >
> > > > Magic KF_RELEASE simplifies things a bit. I don't mind, I guess,
> > > > but would like to hear a 3rd opinion.
> > >
> > > At the moment verifier tracks the following locking/non-sleepable
> > > resources:
> > > - active_irq_id - nesting allowed, forbids sleep, special stack slot acquire/release logic.
> > > - active_rcu_locks - nesting allowed, forbids sleep, used by in in_rcu_cs(),
> > > no special acquire/release logic.
> > > - active_preempt_locks - nesting allowed, forbids sleep, no special acquire/release logic.
> > > - active_locks - nesting disallowed, forbids sleep, no special acquire/release logic.
> > >
> > > Currently verifier checks what kfunc/helper is called and changes
> > > internal state accordingly. It might be possible to slice the above
> > > into a set of "generic" flags / feats, but it requires some thought.
> > >
> > > Maybe explore this as a separate series?
> >
> > Eduard,
> >
> > Are you fine with making KF_ACQUIRE | KF_FORBIDS_SLEEP as a new flag
> > called KF_ACQUIRE_LOCK
> > and then keeping the common KF_RELEASE that will release according to
> > how it was acquired?
>
> Idk, I'd keep it KF_FORBIDS_SLEEP for now and rename later.
> W/o the discussed refactoring ACQUIRE_LOCK is a bit confusing, imo.
Okay, I will keep it as is then and send v4 with the requested changes.
Will rename this in the next set which refactors these active*
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2026-02-24 20:20 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-23 17:46 [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs Puranjay Mohan
2026-02-23 17:46 ` [PATCH bpf-next v3 1/6] bpf: Add KF_ACQUIRE and KF_RELEASE support for iterators Puranjay Mohan
2026-02-23 19:59 ` Mykyta Yatsenko
2026-02-23 20:41 ` Eduard Zingerman
2026-02-23 17:46 ` [PATCH bpf-next v3 2/6] bpf: consolidate sleepable context error message printing Puranjay Mohan
2026-02-23 20:06 ` Mykyta Yatsenko
2026-02-23 20:27 ` Eduard Zingerman
2026-02-23 17:46 ` [PATCH bpf-next v3 3/6] bpf: Add KF_FORBID_SLEEP modifier for KF_ACQUIRE kfuncs Puranjay Mohan
2026-02-23 22:14 ` Eduard Zingerman
2026-02-24 15:24 ` Puranjay Mohan
2026-02-24 18:17 ` Eduard Zingerman
2026-02-24 19:41 ` Kumar Kartikeya Dwivedi
2026-02-23 17:46 ` [PATCH bpf-next v3 4/6] bpf: Move locking to bpf_iter_task_vma_next() Puranjay Mohan
2026-02-23 17:46 ` [PATCH bpf-next v3 5/6] bpf: Add split iteration support to task_vma iterator Puranjay Mohan
2026-02-23 17:46 ` [PATCH bpf-next v3 6/6] selftests/bpf: Add tests for split " Puranjay Mohan
2026-02-24 1:49 ` [PATCH bpf-next v3 0/6] Introduce KF_FORBID_SLEEP modifier for acquire/release kfuncs Alexei Starovoitov
2026-02-24 11:24 ` Puranjay Mohan
2026-02-24 18:00 ` Alexei Starovoitov
2026-02-24 18:55 ` Eduard Zingerman
2026-02-24 19:32 ` Alexei Starovoitov
2026-02-24 19:47 ` Puranjay Mohan
2026-02-24 19:51 ` Puranjay Mohan
2026-02-24 20:15 ` Eduard Zingerman
2026-02-24 20:20 ` Puranjay Mohan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox