[PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf

BPF List
 help / color / mirror / Atom feed

* [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq
@ 2026-02-01  2:53 Alexei Starovoitov
  2026-02-01  2:53 ` [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context Alexei Starovoitov
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2026-02-01  2:53 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, martin.lau, memxor, mykyta.yatsenko5, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

This series reworks implementation of BPF timer and workqueue APIs to
make them usable from any context.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>

Changes in v9:
- Different approach for patches 1 and 3:
- s/EBUSY/ENOENT/ when refcnt==0 to match existing
- drop latch, use refcnt and kmalloc_nolock() instead
- address race between timer/wq_start and delete_elem, add a test
- Link to v8: https://lore.kernel.org/bpf/20260127-timer_nolock-v8-0-5a29a9571059@meta.com/

Changes in v8:
- Return -EBUSY in bpf_async_read_op() if last_seq is failed to be set
- In bpf_async_cancel_and_free() drop bpf_async_cb ref after calling bpf_async_process()
- Link to v7: https://lore.kernel.org/r/20260122-timer_nolock-v7-0-04a45c55c2e2@meta.com

Changes in v7:
- Addressed Andrii's review points from the previous version - nothing
  very significang.
- Added NMI stress tests for bpf_timer - hit few verifier failing checks
  and removed them.
- Address sparse warning in the bpf_async_update_prog_callback()
- Link to v6: https://lore.kernel.org/r/20260120-timer_nolock-v6-0-670ffdd787b4@meta.com

Changes in v6:
- Reworked destruction and refcnt use:
  - On cancel_and_free() set last_seq to BPF_ASYNC_DESTROY value, drop
    map's reference
  - In irq work callback, atomically switch DESTROY to DESTROYED, cancel
    timer/wq
  - Free bpf_async_cb on refcnt going to 0.
- Link to v5: https://lore.kernel.org/r/20260115-timer_nolock-v5-0-15e3aef2703d@meta.com

Changes in v5:
- Extracted lock-free algorithm for updating cb->prog and
cb->callback_fn into a function bpf_async_update_prog_callback(),
added a new commit and introduces this function and uses it in
__bpf_async_set_callback(), bpf_timer_cancel() and
bpf_async_cancel_and_free().
This allows to move the change into the separate commit without breaking
correctness.
- Handle NULL prog in bpf_async_update_prog_callback().
- Link to v4: https://lore.kernel.org/r/20260114-timer_nolock-v4-0-fa6355f51fa7@meta.com

Changes in v4:
- Handle irq_work_queue failures in both schedule and cancel_and_free
paths: introduced bpf_async_refcnt_dec_cleanup() that decrements refcnt
and makes sure if last reference is put, there is at least one irq_work
scheduled to execute final cleanup.
- Additional refcnt inc/dec in set_callback() + rcu lock to make sure
cleanup is not running at the same time as set_callback().
- Added READ_ONCE where it was needed.
- Squash 'bpf: Refactor __bpf_async_set_callback()' commit into 'bpf:
Add lock-free cell for NMI-safe
async operations'
- Removed mpmc_cell, use seqcount_latch_t instead.
- Link to v3: https://lore.kernel.org/r/20260107-timer_nolock-v3-0-740d3ec3e5f9@meta.com

Changes in v3:
- Major rework
- Introduce mpmc_cell, allowing concurrent writes and reads
- Implement irq_work deferring
- Adding selftests
- Introduces bpf_timer_cancel_async kfunc
- Link to v2: https://lore.kernel.org/r/20251105-timer_nolock-v2-0-32698db08bfa@meta.com

Changes in v2:
- Move refcnt initialization and put (from cancel_and_free())
from patch 5 into the patch 4, so that patch 4 has more clear and full
implementation and use of refcnt
- Link to v1: https://lore.kernel.org/r/20251031-timer_nolock-v1-0-b064ae403bfb@meta.com

Alexei Starovoitov (3):
  bpf: Enable bpf_timer and bpf_wq in any context
  bpf: Introduce bpf_timer_cancel_async() kfunc
  selftests/bpf: Add a test to stress bpf_timer_start and map_delete
    race

Mykyta Yatsenko (6):
  bpf: Add verifier support for bpf_timer argument in kfuncs
  selftests/bpf: Refactor timer selftests
  selftests/bpf: Add stress test for timer async cancel
  selftests/bpf: Verify bpf_timer_cancel_async works
  selftests/bpf: Add timer stress test in NMI context
  selftests/bpf: Removed obsolete tests

 kernel/bpf/helpers.c                          | 456 +++++++++++-------
 kernel/bpf/verifier.c                         |  55 ++-
 .../testing/selftests/bpf/prog_tests/timer.c  | 250 +++++++++-
 .../bpf/prog_tests/timer_start_delete_race.c  | 137 ++++++
 tools/testing/selftests/bpf/progs/timer.c     | 118 ++++-
 .../bpf/progs/timer_start_delete_race.c       |  66 +++
 .../bpf/progs/verifier_helper_restricted.c    | 111 -----
 7 files changed, 851 insertions(+), 342 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/timer_start_delete_race.c
 create mode 100644 tools/testing/selftests/bpf/progs/timer_start_delete_race.c

-- 
2.47.3


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context
  2026-02-01  2:53 [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq Alexei Starovoitov
@ 2026-02-01  2:53 ` Alexei Starovoitov
  2026-02-02 13:36   ` Mykyta Yatsenko
                     ` (2 more replies)
  2026-02-01  2:53 ` [PATCH v9 bpf-next 2/9] bpf: Add verifier support for bpf_timer argument in kfuncs Alexei Starovoitov
                   ` (8 subsequent siblings)
  9 siblings, 3 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2026-02-01  2:53 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, martin.lau, memxor, mykyta.yatsenko5, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Refactor bpf_timer and bpf_wq to allow calling them from any context:
- add refcnt to bpf_async_cb
- map_delete_elem or map_free will drop refcnt to zero
  via bpf_async_cancel_and_free()
- once refcnt is zero timer/wq_start is not allowed to make sure
  that callback cannot rearm itself
- if in_hardirq defer to start/cancel operations to irq_work

Co-developed-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/helpers.c | 408 ++++++++++++++++++++++++-------------------
 1 file changed, 225 insertions(+), 183 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index b54ec0e945aa..2eb262d52232 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1095,16 +1095,34 @@ static void *map_key_from_value(struct bpf_map *map, void *value, u32 *arr_idx)
 	return (void *)value - round_up(map->key_size, 8);
 }
 
+enum bpf_async_type {
+	BPF_ASYNC_TYPE_TIMER = 0,
+	BPF_ASYNC_TYPE_WQ,
+};
+
+enum bpf_async_op {
+	BPF_ASYNC_START,
+	BPF_ASYNC_CANCEL
+};
+
+struct bpf_async_cmd {
+	struct llist_node node;
+	u64 nsec;
+	u32 mode;
+	enum bpf_async_op op;
+};
+
 struct bpf_async_cb {
 	struct bpf_map *map;
 	struct bpf_prog *prog;
 	void __rcu *callback_fn;
 	void *value;
-	union {
-		struct rcu_head rcu;
-		struct work_struct delete_work;
-	};
+	struct rcu_head rcu;
 	u64 flags;
+	struct irq_work worker;
+	refcount_t refcnt;
+	enum bpf_async_type type;
+	struct llist_head async_cmds;
 };
 
 /* BPF map elements can contain 'struct bpf_timer'.
@@ -1132,7 +1150,6 @@ struct bpf_hrtimer {
 struct bpf_work {
 	struct bpf_async_cb cb;
 	struct work_struct work;
-	struct work_struct delete_work;
 };
 
 /* the actual struct hidden inside uapi struct bpf_timer and bpf_wq */
@@ -1142,20 +1159,12 @@ struct bpf_async_kern {
 		struct bpf_hrtimer *timer;
 		struct bpf_work *work;
 	};
-	/* bpf_spin_lock is used here instead of spinlock_t to make
-	 * sure that it always fits into space reserved by struct bpf_timer
-	 * regardless of LOCKDEP and spinlock debug flags.
-	 */
-	struct bpf_spin_lock lock;
 } __attribute__((aligned(8)));
 
-enum bpf_async_type {
-	BPF_ASYNC_TYPE_TIMER = 0,
-	BPF_ASYNC_TYPE_WQ,
-};
-
 static DEFINE_PER_CPU(struct bpf_hrtimer *, hrtimer_running);
 
+static void bpf_async_refcount_put(struct bpf_async_cb *cb);
+
 static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
 {
 	struct bpf_hrtimer *t = container_of(hrtimer, struct bpf_hrtimer, timer);
@@ -1219,45 +1228,73 @@ static void bpf_async_cb_rcu_free(struct rcu_head *rcu)
 {
 	struct bpf_async_cb *cb = container_of(rcu, struct bpf_async_cb, rcu);
 
+	/*
+	 * Drop the last reference to prog only after RCU GP, as set_callback()
+	 * may race with cancel_and_free()
+	 */
+	if (cb->prog)
+		bpf_prog_put(cb->prog);
+
 	kfree_nolock(cb);
 }
 
-static void bpf_wq_delete_work(struct work_struct *work)
+/* Callback from call_rcu_tasks_trace, chains to call_rcu for final free */
+static void bpf_async_cb_rcu_tasks_trace_free(struct rcu_head *rcu)
 {
-	struct bpf_work *w = container_of(work, struct bpf_work, delete_work);
+	struct bpf_async_cb *cb = container_of(rcu, struct bpf_async_cb, rcu);
+	struct bpf_hrtimer *t = container_of(cb, struct bpf_hrtimer, cb);
+	struct bpf_work *w = container_of(cb, struct bpf_work, cb);
+	bool retry = false;
 
-	cancel_work_sync(&w->work);
+	/*
+	 * bpf_async_cancel_and_free() tried to cancel timer/wq, but it
+	 * could have raced with timer/wq_start. Now refcnt is zero and
+	 * srcu/rcu GP completed. Cancel timer/wq again.
+	 */
+	switch (cb->type) {
+	case BPF_ASYNC_TYPE_TIMER:
+		if (hrtimer_try_to_cancel(&t->timer) < 0)
+			retry = true;
+		break;
+	case BPF_ASYNC_TYPE_WQ:
+		if (!cancel_work(&w->work))
+			retry = true;
+		break;
+	}
+	if (retry) {
+		/*
+		 * hrtimer or wq callback may still be running. It must be
+		 * in rcu_tasks_trace or rcu CS, so wait for GP again.
+		 * It won't retry forever, since refcnt zero prevents all
+		 * operations on timer/wq.
+		 */
+		call_rcu_tasks_trace(&cb->rcu, bpf_async_cb_rcu_tasks_trace_free);
+		return;
+	}
 
-	call_rcu(&w->cb.rcu, bpf_async_cb_rcu_free);
+	/* rcu_trace_implies_rcu_gp() is true and will remain so */
+	bpf_async_cb_rcu_free(rcu);
 }
 
-static void bpf_timer_delete_work(struct work_struct *work)
+static void bpf_async_refcount_put(struct bpf_async_cb *cb)
 {
-	struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, cb.delete_work);
+	if (!refcount_dec_and_test(&cb->refcnt))
+		return;
 
-	/* Cancel the timer and wait for callback to complete if it was running.
-	 * If hrtimer_cancel() can be safely called it's safe to call
-	 * call_rcu() right after for both preallocated and non-preallocated
-	 * maps.  The async->cb = NULL was already done and no code path can see
-	 * address 't' anymore. Timer if armed for existing bpf_hrtimer before
-	 * bpf_timer_cancel_and_free will have been cancelled.
-	 */
-	hrtimer_cancel(&t->timer);
-	call_rcu(&t->cb.rcu, bpf_async_cb_rcu_free);
+	call_rcu_tasks_trace(&cb->rcu, bpf_async_cb_rcu_tasks_trace_free);
 }
 
+static void bpf_async_cancel_and_free(struct bpf_async_kern *async);
+static void bpf_async_irq_worker(struct irq_work *work);
+
 static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u64 flags,
 			    enum bpf_async_type type)
 {
-	struct bpf_async_cb *cb;
+	struct bpf_async_cb *cb, *old_cb;
 	struct bpf_hrtimer *t;
 	struct bpf_work *w;
 	clockid_t clockid;
 	size_t size;
-	int ret = 0;
-
-	if (in_nmi())
-		return -EOPNOTSUPP;
 
 	switch (type) {
 	case BPF_ASYNC_TYPE_TIMER:
@@ -1270,18 +1307,13 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
 		return -EINVAL;
 	}
 
-	__bpf_spin_lock_irqsave(&async->lock);
-	t = async->timer;
-	if (t) {
-		ret = -EBUSY;
-		goto out;
-	}
+	old_cb = READ_ONCE(async->cb);
+	if (old_cb)
+		return -EBUSY;
 
 	cb = bpf_map_kmalloc_nolock(map, size, 0, map->numa_node);
-	if (!cb) {
-		ret = -ENOMEM;
-		goto out;
-	}
+	if (!cb)
+		return -ENOMEM;
 
 	switch (type) {
 	case BPF_ASYNC_TYPE_TIMER:
@@ -1289,7 +1321,6 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
 		t = (struct bpf_hrtimer *)cb;
 
 		atomic_set(&t->cancelling, 0);
-		INIT_WORK(&t->cb.delete_work, bpf_timer_delete_work);
 		hrtimer_setup(&t->timer, bpf_timer_cb, clockid, HRTIMER_MODE_REL_SOFT);
 		cb->value = (void *)async - map->record->timer_off;
 		break;
@@ -1297,16 +1328,24 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
 		w = (struct bpf_work *)cb;
 
 		INIT_WORK(&w->work, bpf_wq_work);
-		INIT_WORK(&w->delete_work, bpf_wq_delete_work);
 		cb->value = (void *)async - map->record->wq_off;
 		break;
 	}
 	cb->map = map;
 	cb->prog = NULL;
 	cb->flags = flags;
+	cb->worker = IRQ_WORK_INIT(bpf_async_irq_worker);
+	init_llist_head(&cb->async_cmds);
+	refcount_set(&cb->refcnt, 1); /* map's reference */
+	cb->type = type;
 	rcu_assign_pointer(cb->callback_fn, NULL);
 
-	WRITE_ONCE(async->cb, cb);
+	old_cb = cmpxchg(&async->cb, NULL, cb);
+	if (old_cb) {
+		/* Lost the race to initialize this bpf_async_kern, drop the allocated object */
+		kfree_nolock(cb);
+		return -EBUSY;
+	}
 	/* Guarantee the order between async->cb and map->usercnt. So
 	 * when there are concurrent uref release and bpf timer init, either
 	 * bpf_timer_cancel_and_free() called by uref release reads a no-NULL
@@ -1317,13 +1356,11 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
 		/* maps with timers must be either held by user space
 		 * or pinned in bpffs.
 		 */
-		WRITE_ONCE(async->cb, NULL);
-		kfree_nolock(cb);
-		ret = -EPERM;
+		bpf_async_cancel_and_free(async);
+		return -EPERM;
 	}
-out:
-	__bpf_spin_unlock_irqrestore(&async->lock);
-	return ret;
+
+	return 0;
 }
 
 BPF_CALL_3(bpf_timer_init, struct bpf_async_kern *, timer, struct bpf_map *, map,
@@ -1354,8 +1391,9 @@ static const struct bpf_func_proto bpf_timer_init_proto = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
-static int bpf_async_update_prog_callback(struct bpf_async_cb *cb, void *callback_fn,
-					  struct bpf_prog *prog)
+static int bpf_async_update_prog_callback(struct bpf_async_cb *cb,
+					  struct bpf_prog *prog,
+					  void *callback_fn)
 {
 	struct bpf_prog *prev;
 
@@ -1380,7 +1418,8 @@ static int bpf_async_update_prog_callback(struct bpf_async_cb *cb, void *callbac
 		if (prev)
 			bpf_prog_put(prev);
 
-	} while (READ_ONCE(cb->prog) != prog || READ_ONCE(cb->callback_fn) != callback_fn);
+	} while (READ_ONCE(cb->prog) != prog ||
+		 (void __force *)READ_ONCE(cb->callback_fn) != callback_fn);
 
 	if (prog)
 		bpf_prog_put(prog);
@@ -1388,33 +1427,36 @@ static int bpf_async_update_prog_callback(struct bpf_async_cb *cb, void *callbac
 	return 0;
 }
 
+static int bpf_async_schedule_op(struct bpf_async_cb *cb, enum bpf_async_op op,
+				 u64 nsec, u32 timer_mode)
+{
+	WARN_ON_ONCE(!in_hardirq());
+
+	struct bpf_async_cmd *cmd = kmalloc_nolock(sizeof(*cmd), 0, NUMA_NO_NODE);
+
+	if (!cmd) {
+		bpf_async_refcount_put(cb);
+		return -ENOMEM;
+	}
+	init_llist_node(&cmd->node);
+	cmd->nsec = nsec;
+	cmd->mode = timer_mode;
+	cmd->op = op;
+	if (llist_add(&cmd->node, &cb->async_cmds))
+		irq_work_queue(&cb->worker);
+	return 0;
+}
+
 static int __bpf_async_set_callback(struct bpf_async_kern *async, void *callback_fn,
 				    struct bpf_prog *prog)
 {
 	struct bpf_async_cb *cb;
-	int ret = 0;
 
-	if (in_nmi())
-		return -EOPNOTSUPP;
-	__bpf_spin_lock_irqsave(&async->lock);
-	cb = async->cb;
-	if (!cb) {
-		ret = -EINVAL;
-		goto out;
-	}
-	if (!atomic64_read(&cb->map->usercnt)) {
-		/* maps with timers must be either held by user space
-		 * or pinned in bpffs. Otherwise timer might still be
-		 * running even when bpf prog is detached and user space
-		 * is gone, since map_release_uref won't ever be called.
-		 */
-		ret = -EPERM;
-		goto out;
-	}
-	ret = bpf_async_update_prog_callback(cb, callback_fn, prog);
-out:
-	__bpf_spin_unlock_irqrestore(&async->lock);
-	return ret;
+	cb = READ_ONCE(async->cb);
+	if (!cb)
+		return -EINVAL;
+
+	return bpf_async_update_prog_callback(cb, prog, callback_fn);
 }
 
 BPF_CALL_3(bpf_timer_set_callback, struct bpf_async_kern *, timer, void *, callback_fn,
@@ -1431,22 +1473,17 @@ static const struct bpf_func_proto bpf_timer_set_callback_proto = {
 	.arg2_type	= ARG_PTR_TO_FUNC,
 };
 
-BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, timer, u64, nsecs, u64, flags)
+BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, async, u64, nsecs, u64, flags)
 {
 	struct bpf_hrtimer *t;
-	int ret = 0;
-	enum hrtimer_mode mode;
+	u32 mode;
 
-	if (in_nmi())
-		return -EOPNOTSUPP;
 	if (flags & ~(BPF_F_TIMER_ABS | BPF_F_TIMER_CPU_PIN))
 		return -EINVAL;
-	__bpf_spin_lock_irqsave(&timer->lock);
-	t = timer->timer;
-	if (!t || !t->cb.prog) {
-		ret = -EINVAL;
-		goto out;
-	}
+
+	t = READ_ONCE(async->timer);
+	if (!t || !READ_ONCE(t->cb.prog))
+		return -EINVAL;
 
 	if (flags & BPF_F_TIMER_ABS)
 		mode = HRTIMER_MODE_ABS_SOFT;
@@ -1456,10 +1493,20 @@ BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, timer, u64, nsecs, u64, fla
 	if (flags & BPF_F_TIMER_CPU_PIN)
 		mode |= HRTIMER_MODE_PINNED;
 
-	hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
-out:
-	__bpf_spin_unlock_irqrestore(&timer->lock);
-	return ret;
+	/*
+	 * bpf_async_cancel_and_free() could have dropped refcnt to zero. In
+	 * such case BPF progs are not allowed to arm the timer to prevent UAF.
+	 */
+	if (!refcount_inc_not_zero(&t->cb.refcnt))
+		return -ENOENT;
+
+	if (!in_hardirq()) {
+		hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
+		bpf_async_refcount_put(&t->cb);
+		return 0;
+	} else {
+		return bpf_async_schedule_op(&t->cb, BPF_ASYNC_START, nsecs, mode);
+	}
 }
 
 static const struct bpf_func_proto bpf_timer_start_proto = {
@@ -1477,11 +1524,9 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_async_kern *, async)
 	bool inc = false;
 	int ret = 0;
 
-	if (in_nmi())
+	if (in_hardirq())
 		return -EOPNOTSUPP;
 
-	guard(rcu)();
-
 	t = READ_ONCE(async->timer);
 	if (!t)
 		return -EINVAL;
@@ -1536,78 +1581,85 @@ static const struct bpf_func_proto bpf_timer_cancel_proto = {
 	.arg1_type	= ARG_PTR_TO_TIMER,
 };
 
-static struct bpf_async_cb *__bpf_async_cancel_and_free(struct bpf_async_kern *async)
+static void bpf_async_process_op(struct bpf_async_cb *cb, u32 op,
+				 u64 timer_nsec, u32 timer_mode)
+{
+	switch (cb->type) {
+	case BPF_ASYNC_TYPE_TIMER: {
+		struct bpf_hrtimer *t = container_of(cb, struct bpf_hrtimer, cb);
+
+		switch (op) {
+		case BPF_ASYNC_START:
+			hrtimer_start(&t->timer, ns_to_ktime(timer_nsec), timer_mode);
+			break;
+		case BPF_ASYNC_CANCEL:
+			hrtimer_try_to_cancel(&t->timer);
+			break;
+		}
+		break;
+	}
+	case BPF_ASYNC_TYPE_WQ: {
+		struct bpf_work *w = container_of(cb, struct bpf_work, cb);
+
+		switch (op) {
+		case BPF_ASYNC_START:
+			schedule_work(&w->work);
+			break;
+		case BPF_ASYNC_CANCEL:
+			cancel_work(&w->work);
+			break;
+		}
+		break;
+	}
+	}
+	bpf_async_refcount_put(cb);
+}
+
+static void bpf_async_irq_worker(struct irq_work *work)
+{
+	struct bpf_async_cb *cb = container_of(work, struct bpf_async_cb, worker);
+	struct llist_node *pos, *n, *list;
+
+	list = llist_del_all(&cb->async_cmds);
+	if (!list)
+		return;
+
+	list = llist_reverse_order(list);
+	llist_for_each_safe(pos, n, list) {
+		struct bpf_async_cmd *cmd;
+
+		cmd = container_of(pos, struct bpf_async_cmd, node);
+		bpf_async_process_op(cb, cmd->op, cmd->nsec, cmd->mode);
+		kfree_nolock(cmd);
+	}
+}
+
+static void bpf_async_cancel_and_free(struct bpf_async_kern *async)
 {
 	struct bpf_async_cb *cb;
 
-	/* Performance optimization: read async->cb without lock first. */
 	if (!READ_ONCE(async->cb))
-		return NULL;
+		return;
 
-	__bpf_spin_lock_irqsave(&async->lock);
-	/* re-read it under lock */
-	cb = async->cb;
+	cb = xchg(&async->cb, NULL);
 	if (!cb)
-		goto out;
-	bpf_async_update_prog_callback(cb, NULL, NULL);
-	/* The subsequent bpf_timer_start/cancel() helpers won't be able to use
-	 * this timer, since it won't be initialized.
-	 */
-	WRITE_ONCE(async->cb, NULL);
-out:
-	__bpf_spin_unlock_irqrestore(&async->lock);
-	return cb;
-}
+		return;
 
-static void bpf_timer_delete(struct bpf_hrtimer *t)
-{
 	/*
-	 * We check that bpf_map_delete/update_elem() was called from timer
-	 * callback_fn. In such case we don't call hrtimer_cancel() (since it
-	 * will deadlock) and don't call hrtimer_try_to_cancel() (since it will
-	 * just return -1). Though callback_fn is still running on this cpu it's
-	 * safe to do kfree(t) because bpf_timer_cb() read everything it needed
-	 * from 't'. The bpf subprog callback_fn won't be able to access 't',
-	 * since async->cb = NULL was already done. The timer will be
-	 * effectively cancelled because bpf_timer_cb() will return
-	 * HRTIMER_NORESTART.
-	 *
-	 * However, it is possible the timer callback_fn calling us armed the
-	 * timer _before_ calling us, such that failing to cancel it here will
-	 * cause it to possibly use struct hrtimer after freeing bpf_hrtimer.
-	 * Therefore, we _need_ to cancel any outstanding timers before we do
-	 * call_rcu, even though no more timers can be armed.
-	 *
-	 * Moreover, we need to schedule work even if timer does not belong to
-	 * the calling callback_fn, as on two different CPUs, we can end up in a
-	 * situation where both sides run in parallel, try to cancel one
-	 * another, and we end up waiting on both sides in hrtimer_cancel
-	 * without making forward progress, since timer1 depends on time2
-	 * callback to finish, and vice versa.
-	 *
-	 *  CPU 1 (timer1_cb)			CPU 2 (timer2_cb)
-	 *  bpf_timer_cancel_and_free(timer2)	bpf_timer_cancel_and_free(timer1)
-	 *
-	 * To avoid these issues, punt to workqueue context when we are in a
-	 * timer callback.
+	 * No refcount_inc_not_zero(&cb->refcnt) here. Dropping the last
+	 * refcnt. Either synchronously or asynchronously in irq_work.
 	 */
-	if (this_cpu_read(hrtimer_running)) {
-		queue_work(system_dfl_wq, &t->cb.delete_work);
-		return;
-	}
 
-	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
-		/* If the timer is running on other CPU, also use a kworker to
-		 * wait for the completion of the timer instead of trying to
-		 * acquire a sleepable lock in hrtimer_cancel() to wait for its
-		 * completion.
-		 */
-		if (hrtimer_try_to_cancel(&t->timer) >= 0)
-			call_rcu(&t->cb.rcu, bpf_async_cb_rcu_free);
-		else
-			queue_work(system_dfl_wq, &t->cb.delete_work);
+	if (!in_hardirq()) {
+		bpf_async_process_op(cb, BPF_ASYNC_CANCEL, 0, 0);
 	} else {
-		bpf_timer_delete_work(&t->cb.delete_work);
+		(void)bpf_async_schedule_op(cb, BPF_ASYNC_CANCEL, 0, 0);
+		/*
+		 * bpf_async_schedule_op() either enqueues allocated cmd into llist
+		 * or fails with ENOMEM and drop the last refcnt.
+		 * This is unlikely, but safe, since bpf_async_cb_rcu_tasks_trace_free()
+		 * callback will do additional timer/wq_cancel due to races anyway.
+		 */
 	}
 }
 
@@ -1617,33 +1669,16 @@ static void bpf_timer_delete(struct bpf_hrtimer *t)
  */
 void bpf_timer_cancel_and_free(void *val)
 {
-	struct bpf_hrtimer *t;
-
-	t = (struct bpf_hrtimer *)__bpf_async_cancel_and_free(val);
-	if (!t)
-		return;
-
-	bpf_timer_delete(t);
+	bpf_async_cancel_and_free(val);
 }
 
-/* This function is called by map_delete/update_elem for individual element and
+/*
+ * This function is called by map_delete/update_elem for individual element and
  * by ops->map_release_uref when the user space reference to a map reaches zero.
  */
 void bpf_wq_cancel_and_free(void *val)
 {
-	struct bpf_work *work;
-
-	BTF_TYPE_EMIT(struct bpf_wq);
-
-	work = (struct bpf_work *)__bpf_async_cancel_and_free(val);
-	if (!work)
-		return;
-	/* Trigger cancel of the sleepable work, but *do not* wait for
-	 * it to finish if it was running as we might not be in a
-	 * sleepable context.
-	 * kfree will be called once the work has finished.
-	 */
-	schedule_work(&work->delete_work);
+	bpf_async_cancel_and_free(val);
 }
 
 BPF_CALL_2(bpf_kptr_xchg, void *, dst, void *, ptr)
@@ -3116,16 +3151,23 @@ __bpf_kfunc int bpf_wq_start(struct bpf_wq *wq, unsigned int flags)
 	struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
 	struct bpf_work *w;
 
-	if (in_nmi())
-		return -EOPNOTSUPP;
 	if (flags)
 		return -EINVAL;
+
 	w = READ_ONCE(async->work);
 	if (!w || !READ_ONCE(w->cb.prog))
 		return -EINVAL;
 
-	schedule_work(&w->work);
-	return 0;
+	if (!refcount_inc_not_zero(&w->cb.refcnt))
+		return -ENOENT;
+
+	if (!in_hardirq()) {
+		schedule_work(&w->work);
+		bpf_async_refcount_put(&w->cb);
+		return 0;
+	} else {
+		return bpf_async_schedule_op(&w->cb, BPF_ASYNC_START, 0, 0);
+	}
 }
 
 __bpf_kfunc int bpf_wq_set_callback(struct bpf_wq *wq,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v9 bpf-next 2/9] bpf: Add verifier support for bpf_timer argument in kfuncs
  2026-02-01  2:53 [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq Alexei Starovoitov
  2026-02-01  2:53 ` [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context Alexei Starovoitov
@ 2026-02-01  2:53 ` Alexei Starovoitov
  2026-02-01  3:15   ` bot+bpf-ci
  2026-02-01  2:53 ` [PATCH v9 bpf-next 3/9] bpf: Introduce bpf_timer_cancel_async() kfunc Alexei Starovoitov
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Alexei Starovoitov @ 2026-02-01  2:53 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, martin.lau, memxor, mykyta.yatsenko5, kernel-team

From: Mykyta Yatsenko <yatsenko@meta.com>

Extend the verifier to recognize struct bpf_timer as a valid kfunc
argument type. Previously, bpf_timer was only supported in BPF helpers.

This prepares for adding timer-related kfuncs in subsequent patches.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/verifier.c | 55 +++++++++++++++++++++++++++++--------------
 1 file changed, 37 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6b62b6d57175..9b1853fec730 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8664,13 +8664,25 @@ static int check_map_field_pointer(struct bpf_verifier_env *env, u32 regno,
 }
 
 static int process_timer_func(struct bpf_verifier_env *env, int regno,
-			      struct bpf_call_arg_meta *meta)
+			      struct bpf_map_desc *map)
 {
 	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
 		verbose(env, "bpf_timer cannot be used for PREEMPT_RT.\n");
 		return -EOPNOTSUPP;
 	}
-	return check_map_field_pointer(env, regno, BPF_TIMER, &meta->map);
+	return check_map_field_pointer(env, regno, BPF_TIMER, map);
+}
+
+static int process_timer_helper(struct bpf_verifier_env *env, int regno,
+				struct bpf_call_arg_meta *meta)
+{
+	return process_timer_func(env, regno, &meta->map);
+}
+
+static int process_timer_kfunc(struct bpf_verifier_env *env, int regno,
+			       struct bpf_kfunc_call_arg_meta *meta)
+{
+	return process_timer_func(env, regno, &meta->map);
 }
 
 static int process_kptr_func(struct bpf_verifier_env *env, int regno,
@@ -9956,7 +9968,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		}
 		break;
 	case ARG_PTR_TO_TIMER:
-		err = process_timer_func(env, regno, meta);
+		err = process_timer_helper(env, regno, meta);
 		if (err)
 			return err;
 		break;
@@ -12221,7 +12233,8 @@ enum {
 	KF_ARG_WORKQUEUE_ID,
 	KF_ARG_RES_SPIN_LOCK_ID,
 	KF_ARG_TASK_WORK_ID,
-	KF_ARG_PROG_AUX_ID
+	KF_ARG_PROG_AUX_ID,
+	KF_ARG_TIMER_ID
 };
 
 BTF_ID_LIST(kf_arg_btf_ids)
@@ -12234,6 +12247,7 @@ BTF_ID(struct, bpf_wq)
 BTF_ID(struct, bpf_res_spin_lock)
 BTF_ID(struct, bpf_task_work)
 BTF_ID(struct, bpf_prog_aux)
+BTF_ID(struct, bpf_timer)
 
 static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
 				    const struct btf_param *arg, int type)
@@ -12277,6 +12291,11 @@ static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_par
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
 }
 
+static bool is_kfunc_arg_timer(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_TIMER_ID);
+}
+
 static bool is_kfunc_arg_wq(const struct btf *btf, const struct btf_param *arg)
 {
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_WORKQUEUE_ID);
@@ -12376,6 +12395,7 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_NULL,
 	KF_ARG_PTR_TO_CONST_STR,
 	KF_ARG_PTR_TO_MAP,
+	KF_ARG_PTR_TO_TIMER,
 	KF_ARG_PTR_TO_WORKQUEUE,
 	KF_ARG_PTR_TO_IRQ_FLAG,
 	KF_ARG_PTR_TO_RES_SPIN_LOCK,
@@ -12625,6 +12645,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_wq(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_WORKQUEUE;
 
+	if (is_kfunc_arg_timer(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_TIMER;
+
 	if (is_kfunc_arg_task_work(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_TASK_WORK;
 
@@ -13411,6 +13434,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_REFCOUNTED_KPTR:
 		case KF_ARG_PTR_TO_CONST_STR:
 		case KF_ARG_PTR_TO_WORKQUEUE:
+		case KF_ARG_PTR_TO_TIMER:
 		case KF_ARG_PTR_TO_TASK_WORK:
 		case KF_ARG_PTR_TO_IRQ_FLAG:
 		case KF_ARG_PTR_TO_RES_SPIN_LOCK:
@@ -13710,6 +13734,15 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (ret < 0)
 				return ret;
 			break;
+		case KF_ARG_PTR_TO_TIMER:
+			if (reg->type != PTR_TO_MAP_VALUE) {
+				verbose(env, "arg#%d doesn't point to a map value\n", i);
+				return -EINVAL;
+			}
+			ret = process_timer_kfunc(env, regno, meta);
+			if (ret < 0)
+				return ret;
+			break;
 		case KF_ARG_PTR_TO_TASK_WORK:
 			if (reg->type != PTR_TO_MAP_VALUE) {
 				verbose(env, "arg#%d doesn't point to a map value\n", i);
@@ -21294,20 +21327,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 		}
 	}
 
-	if (btf_record_has_field(map->record, BPF_TIMER)) {
-		if (is_tracing_prog_type(prog_type)) {
-			verbose(env, "tracing progs cannot use bpf_timer yet\n");
-			return -EINVAL;
-		}
-	}
-
-	if (btf_record_has_field(map->record, BPF_WORKQUEUE)) {
-		if (is_tracing_prog_type(prog_type)) {
-			verbose(env, "tracing progs cannot use bpf_wq yet\n");
-			return -EINVAL;
-		}
-	}
-
 	if ((bpf_prog_is_offloaded(prog->aux) || bpf_map_is_offloaded(map)) &&
 	    !bpf_offload_prog_map_match(prog, map)) {
 		verbose(env, "offload device mismatch between prog and map\n");
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v9 bpf-next 3/9] bpf: Introduce bpf_timer_cancel_async() kfunc
  2026-02-01  2:53 [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq Alexei Starovoitov
  2026-02-01  2:53 ` [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context Alexei Starovoitov
  2026-02-01  2:53 ` [PATCH v9 bpf-next 2/9] bpf: Add verifier support for bpf_timer argument in kfuncs Alexei Starovoitov
@ 2026-02-01  2:53 ` Alexei Starovoitov
  2026-02-01  2:53 ` [PATCH v9 bpf-next 4/9] selftests/bpf: Refactor timer selftests Alexei Starovoitov
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2026-02-01  2:53 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, martin.lau, memxor, mykyta.yatsenko5, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Introduce bpf_timer_cancel_async() that wraps hrtimer_try_to_cancel()
and executes it either synchronously or defers to irq_work.

Co-developed-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/helpers.c | 48 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 2eb262d52232..2ea0c08f5ef4 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -4426,6 +4426,53 @@ __bpf_kfunc int bpf_dynptr_file_discard(struct bpf_dynptr *dynptr)
 	return 0;
 }
 
+/**
+ * bpf_timer_cancel_async - try to deactivate a timer
+ * @timer:	bpf_timer to stop
+ *
+ * Returns:
+ *
+ *  *  0 when the timer was not active
+ *  *  1 when the timer was active
+ *  * -1 when the timer is currently executing the callback function and
+ *       cannot be stopped
+ *  * -ECANCELED when the timer will be cancelled asynchronously
+ *  * -ENOMEM when out of memory
+ *  * -EINVAL when the timer was not initialized
+ *  * -ENOENT when this kfunc is racing with timer deletion
+ */
+__bpf_kfunc int bpf_timer_cancel_async(struct bpf_timer *timer)
+{
+	struct bpf_async_kern *async = (void *)timer;
+	struct bpf_async_cb *cb;
+	int ret;
+
+	cb = READ_ONCE(async->cb);
+	if (!cb)
+		return -EINVAL;
+
+	/*
+	 * Unlike hrtimer_start() it's ok to synchronously call
+	 * hrtimer_try_to_cancel() when refcnt reached zero, but deferring to
+	 * irq_work is not, since irq callback may execute after RCU GP and
+	 * cb could be freed at that time. Check for refcnt zero for
+	 * consistency.
+	 */
+	if (!refcount_inc_not_zero(&cb->refcnt))
+		return -ENOENT;
+
+	if (!in_hardirq()) {
+		struct bpf_hrtimer *t = container_of(cb, struct bpf_hrtimer, cb);
+
+		ret = hrtimer_try_to_cancel(&t->timer);
+		bpf_async_refcount_put(cb);
+		return ret;
+	} else {
+		ret = bpf_async_schedule_op(cb, BPF_ASYNC_CANCEL, 0, 0);
+		return ret ? ret : -ECANCELED;
+	}
+}
+
 __bpf_kfunc_end_defs();
 
 static void bpf_task_work_cancel_scheduled(struct irq_work *irq_work)
@@ -4608,6 +4655,7 @@ BTF_ID_FLAGS(func, bpf_task_work_schedule_signal, KF_IMPLICIT_ARGS)
 BTF_ID_FLAGS(func, bpf_task_work_schedule_resume, KF_IMPLICIT_ARGS)
 BTF_ID_FLAGS(func, bpf_dynptr_from_file)
 BTF_ID_FLAGS(func, bpf_dynptr_file_discard)
+BTF_ID_FLAGS(func, bpf_timer_cancel_async)
 BTF_KFUNCS_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v9 bpf-next 4/9] selftests/bpf: Refactor timer selftests
  2026-02-01  2:53 [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq Alexei Starovoitov
                   ` (2 preceding siblings ...)
  2026-02-01  2:53 ` [PATCH v9 bpf-next 3/9] bpf: Introduce bpf_timer_cancel_async() kfunc Alexei Starovoitov
@ 2026-02-01  2:53 ` Alexei Starovoitov
  2026-02-01  2:53 ` [PATCH v9 bpf-next 5/9] selftests/bpf: Add stress test for timer async cancel Alexei Starovoitov
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2026-02-01  2:53 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, martin.lau, memxor, mykyta.yatsenko5, kernel-team

From: Mykyta Yatsenko <yatsenko@meta.com>

Refactor timer selftests, extracting stress test into a separate test.
This makes it easier to debug test failures and allows to extend.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 .../testing/selftests/bpf/prog_tests/timer.c  | 55 ++++++++++++-------
 1 file changed, 36 insertions(+), 19 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/timer.c b/tools/testing/selftests/bpf/prog_tests/timer.c
index 34f9ccce2602..4d853d1bd2a7 100644
--- a/tools/testing/selftests/bpf/prog_tests/timer.c
+++ b/tools/testing/selftests/bpf/prog_tests/timer.c
@@ -22,13 +22,35 @@ static void *spin_lock_thread(void *arg)
 	pthread_exit(arg);
 }
 
-static int timer(struct timer *timer_skel)
+
+static int timer_stress(struct timer *timer_skel)
 {
-	int i, err, prog_fd;
+	int i, err = 1, prog_fd;
 	LIBBPF_OPTS(bpf_test_run_opts, topts);
 	pthread_t thread_id[NUM_THR];
 	void *ret;
 
+	prog_fd = bpf_program__fd(timer_skel->progs.race);
+	for (i = 0; i < NUM_THR; i++) {
+		err = pthread_create(&thread_id[i], NULL,
+				     &spin_lock_thread, &prog_fd);
+		if (!ASSERT_OK(err, "pthread_create"))
+			break;
+	}
+
+	while (i) {
+		err = pthread_join(thread_id[--i], &ret);
+		if (ASSERT_OK(err, "pthread_join"))
+			ASSERT_EQ(ret, (void *)&prog_fd, "pthread_join");
+	}
+	return err;
+}
+
+static int timer(struct timer *timer_skel)
+{
+	int err, prog_fd;
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+
 	err = timer__attach(timer_skel);
 	if (!ASSERT_OK(err, "timer_attach"))
 		return err;
@@ -63,25 +85,10 @@ static int timer(struct timer *timer_skel)
 	/* check that code paths completed */
 	ASSERT_EQ(timer_skel->bss->ok, 1 | 2 | 4, "ok");
 
-	prog_fd = bpf_program__fd(timer_skel->progs.race);
-	for (i = 0; i < NUM_THR; i++) {
-		err = pthread_create(&thread_id[i], NULL,
-				     &spin_lock_thread, &prog_fd);
-		if (!ASSERT_OK(err, "pthread_create"))
-			break;
-	}
-
-	while (i) {
-		err = pthread_join(thread_id[--i], &ret);
-		if (ASSERT_OK(err, "pthread_join"))
-			ASSERT_EQ(ret, (void *)&prog_fd, "pthread_join");
-	}
-
 	return 0;
 }
 
-/* TODO: use pid filtering */
-void serial_test_timer(void)
+static void test_timer(int (*timer_test_fn)(struct timer *timer_skel))
 {
 	struct timer *timer_skel = NULL;
 	int err;
@@ -94,13 +101,23 @@ void serial_test_timer(void)
 	if (!ASSERT_OK_PTR(timer_skel, "timer_skel_load"))
 		return;
 
-	err = timer(timer_skel);
+	err = timer_test_fn(timer_skel);
 	ASSERT_OK(err, "timer");
 	timer__destroy(timer_skel);
+}
+
+void serial_test_timer(void)
+{
+	test_timer(timer);
 
 	RUN_TESTS(timer_failure);
 }
 
+void serial_test_timer_stress(void)
+{
+	test_timer(timer_stress);
+}
+
 void test_timer_interrupt(void)
 {
 	struct timer_interrupt *skel = NULL;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v9 bpf-next 5/9] selftests/bpf: Add stress test for timer async cancel
  2026-02-01  2:53 [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq Alexei Starovoitov
                   ` (3 preceding siblings ...)
  2026-02-01  2:53 ` [PATCH v9 bpf-next 4/9] selftests/bpf: Refactor timer selftests Alexei Starovoitov
@ 2026-02-01  2:53 ` Alexei Starovoitov
  2026-02-01  2:54 ` [PATCH v9 bpf-next 6/9] selftests/bpf: Verify bpf_timer_cancel_async works Alexei Starovoitov
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2026-02-01  2:53 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, martin.lau, memxor, mykyta.yatsenko5, kernel-team

From: Mykyta Yatsenko <yatsenko@meta.com>

Extend BPF timer selftest to run stress test for async cancel.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/prog_tests/timer.c | 18 +++++++++++++++++-
 tools/testing/selftests/bpf/progs/timer.c      | 14 +++++++++++---
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/timer.c b/tools/testing/selftests/bpf/prog_tests/timer.c
index 4d853d1bd2a7..a157a2a699e6 100644
--- a/tools/testing/selftests/bpf/prog_tests/timer.c
+++ b/tools/testing/selftests/bpf/prog_tests/timer.c
@@ -23,13 +23,14 @@ static void *spin_lock_thread(void *arg)
 }
 
 
-static int timer_stress(struct timer *timer_skel)
+static int timer_stress_runner(struct timer *timer_skel, bool async_cancel)
 {
 	int i, err = 1, prog_fd;
 	LIBBPF_OPTS(bpf_test_run_opts, topts);
 	pthread_t thread_id[NUM_THR];
 	void *ret;
 
+	timer_skel->bss->async_cancel = async_cancel;
 	prog_fd = bpf_program__fd(timer_skel->progs.race);
 	for (i = 0; i < NUM_THR; i++) {
 		err = pthread_create(&thread_id[i], NULL,
@@ -46,6 +47,16 @@ static int timer_stress(struct timer *timer_skel)
 	return err;
 }
 
+static int timer_stress(struct timer *timer_skel)
+{
+	return timer_stress_runner(timer_skel, false);
+}
+
+static int timer_stress_async_cancel(struct timer *timer_skel)
+{
+	return timer_stress_runner(timer_skel, true);
+}
+
 static int timer(struct timer *timer_skel)
 {
 	int err, prog_fd;
@@ -118,6 +129,11 @@ void serial_test_timer_stress(void)
 	test_timer(timer_stress);
 }
 
+void serial_test_timer_stress_async_cancel(void)
+{
+	test_timer(timer_stress_async_cancel);
+}
+
 void test_timer_interrupt(void)
 {
 	struct timer_interrupt *skel = NULL;
diff --git a/tools/testing/selftests/bpf/progs/timer.c b/tools/testing/selftests/bpf/progs/timer.c
index 4c677c001258..a81413514e4b 100644
--- a/tools/testing/selftests/bpf/progs/timer.c
+++ b/tools/testing/selftests/bpf/progs/timer.c
@@ -1,13 +1,17 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2021 Facebook */
-#include <linux/bpf.h>
-#include <time.h>
+
+#include <vmlinux.h>
 #include <stdbool.h>
 #include <errno.h>
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_tracing.h>
 
+#define CLOCK_MONOTONIC 1
+#define CLOCK_BOOTTIME 7
+
 char _license[] SEC("license") = "GPL";
+
 struct hmap_elem {
 	int counter;
 	struct bpf_timer timer;
@@ -63,6 +67,7 @@ __u64 callback_check = 52;
 __u64 callback2_check = 52;
 __u64 pinned_callback_check;
 __s32 pinned_cpu;
+bool async_cancel = 0;
 
 #define ARRAY 1
 #define HTAB 2
@@ -419,7 +424,10 @@ int race(void *ctx)
 
 	bpf_timer_set_callback(timer, race_timer_callback);
 	bpf_timer_start(timer, 0, 0);
-	bpf_timer_cancel(timer);
+	if (async_cancel)
+		bpf_timer_cancel_async(timer);
+	else
+		bpf_timer_cancel(timer);
 
 	return 0;
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v9 bpf-next 6/9] selftests/bpf: Verify bpf_timer_cancel_async works
  2026-02-01  2:53 [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq Alexei Starovoitov
                   ` (4 preceding siblings ...)
  2026-02-01  2:53 ` [PATCH v9 bpf-next 5/9] selftests/bpf: Add stress test for timer async cancel Alexei Starovoitov
@ 2026-02-01  2:54 ` Alexei Starovoitov
  2026-02-01  2:54 ` [PATCH v9 bpf-next 7/9] selftests/bpf: Add timer stress test in NMI context Alexei Starovoitov
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2026-02-01  2:54 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, martin.lau, memxor, mykyta.yatsenko5, kernel-team

From: Mykyta Yatsenko <yatsenko@meta.com>

Add test that verifies that bpf_timer_cancel_async works: can cancel
callback successfully.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 .../testing/selftests/bpf/prog_tests/timer.c  | 25 +++++++++++++++++++
 tools/testing/selftests/bpf/progs/timer.c     | 23 +++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/timer.c b/tools/testing/selftests/bpf/prog_tests/timer.c
index a157a2a699e6..2b932d4dfd43 100644
--- a/tools/testing/selftests/bpf/prog_tests/timer.c
+++ b/tools/testing/selftests/bpf/prog_tests/timer.c
@@ -99,6 +99,26 @@ static int timer(struct timer *timer_skel)
 	return 0;
 }
 
+static int timer_cancel_async(struct timer *timer_skel)
+{
+	int err, prog_fd;
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+	prog_fd = bpf_program__fd(timer_skel->progs.test_async_cancel_succeed);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run");
+	ASSERT_EQ(topts.retval, 0, "test_run");
+
+	usleep(500);
+	/* check that there were no errors in timer execution */
+	ASSERT_EQ(timer_skel->bss->err, 0, "err");
+
+	/* check that code paths completed */
+	ASSERT_EQ(timer_skel->bss->ok, 1 | 2 | 4, "ok");
+
+	return 0;
+}
+
 static void test_timer(int (*timer_test_fn)(struct timer *timer_skel))
 {
 	struct timer *timer_skel = NULL;
@@ -134,6 +154,11 @@ void serial_test_timer_stress_async_cancel(void)
 	test_timer(timer_stress_async_cancel);
 }
 
+void serial_test_timer_async_cancel(void)
+{
+	test_timer(timer_cancel_async);
+}
+
 void test_timer_interrupt(void)
 {
 	struct timer_interrupt *skel = NULL;
diff --git a/tools/testing/selftests/bpf/progs/timer.c b/tools/testing/selftests/bpf/progs/timer.c
index a81413514e4b..4b4ca781e7cd 100644
--- a/tools/testing/selftests/bpf/progs/timer.c
+++ b/tools/testing/selftests/bpf/progs/timer.c
@@ -169,6 +169,29 @@ int BPF_PROG2(test1, int, a)
 	return 0;
 }
 
+static int timer_error(void *map, int *key, struct bpf_timer *timer)
+{
+	err = 42;
+	return 0;
+}
+
+SEC("syscall")
+int test_async_cancel_succeed(void *ctx)
+{
+	struct bpf_timer *arr_timer;
+	int array_key = ARRAY;
+
+	arr_timer = bpf_map_lookup_elem(&array, &array_key);
+	if (!arr_timer)
+		return 0;
+	bpf_timer_init(arr_timer, &array, CLOCK_MONOTONIC);
+	bpf_timer_set_callback(arr_timer, timer_error);
+	bpf_timer_start(arr_timer, 100000 /* 100us */, 0);
+	bpf_timer_cancel_async(arr_timer);
+	ok = 7;
+	return 0;
+}
+
 /* callback for prealloc and non-prealloca hashtab timers */
 static int timer_cb2(void *map, int *key, struct hmap_elem *val)
 {
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v9 bpf-next 7/9] selftests/bpf: Add timer stress test in NMI context
  2026-02-01  2:53 [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq Alexei Starovoitov
                   ` (5 preceding siblings ...)
  2026-02-01  2:54 ` [PATCH v9 bpf-next 6/9] selftests/bpf: Verify bpf_timer_cancel_async works Alexei Starovoitov
@ 2026-02-01  2:54 ` Alexei Starovoitov
  2026-02-01  2:54 ` [PATCH v9 bpf-next 8/9] selftests/bpf: Removed obsolete tests Alexei Starovoitov
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2026-02-01  2:54 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, martin.lau, memxor, mykyta.yatsenko5, kernel-team

From: Mykyta Yatsenko <yatsenko@meta.com>

Add stress tests for BPF timers that run in NMI context using perf_event
programs attached to PERF_COUNT_HW_CPU_CYCLES.

The tests cover three scenarios:
- nmi_race: Tests concurrent timer start and async cancel operations
- nmi_update: Tests updating a map element (effectively deleting and
  inserting new for array map) from within a timer callback
- nmi_cancel: Tests timer self-cancellation attempt.

A common test_common() helper is used to share timer setup logic across
all test modes.

The tests spawn multiple threads in a child process to generate
perf events, which trigger the BPF programs in NMI context. Hit counters
verify that the NMI code paths were actually exercised.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 .../testing/selftests/bpf/prog_tests/timer.c  | 158 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/timer.c     |  85 ++++++++--
 2 files changed, 231 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/timer.c b/tools/testing/selftests/bpf/prog_tests/timer.c
index 2b932d4dfd43..09ff21e1ad2f 100644
--- a/tools/testing/selftests/bpf/prog_tests/timer.c
+++ b/tools/testing/selftests/bpf/prog_tests/timer.c
@@ -1,12 +1,27 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2021 Facebook */
+#include <sched.h>
 #include <test_progs.h>
+#include <linux/perf_event.h>
+#include <sys/syscall.h>
 #include "timer.skel.h"
 #include "timer_failure.skel.h"
 #include "timer_interrupt.skel.h"
 
 #define NUM_THR 8
 
+static int perf_event_open(__u32 type, __u64 config, int pid, int cpu)
+{
+	struct perf_event_attr attr = {
+		.type = type,
+		.config = config,
+		.size = sizeof(struct perf_event_attr),
+		.sample_period = 10000,
+	};
+
+	return syscall(__NR_perf_event_open, &attr, pid, cpu, -1, 0);
+}
+
 static void *spin_lock_thread(void *arg)
 {
 	int i, err, prog_fd = *(int *)arg;
@@ -57,6 +72,134 @@ static int timer_stress_async_cancel(struct timer *timer_skel)
 	return timer_stress_runner(timer_skel, true);
 }
 
+static void *nmi_cpu_worker(void *arg)
+{
+	volatile __u64 num = 1;
+	int i;
+
+	for (i = 0; i < 500000000; ++i)
+		num *= (i % 7) + 1;
+	(void)num;
+
+	return NULL;
+}
+
+static int run_nmi_test(struct timer *timer_skel, struct bpf_program *prog)
+{
+	struct bpf_link *link = NULL;
+	int pe_fd = -1, pipefd[2] = {-1, -1}, pid = 0, status;
+	char buf = 0;
+	int ret = -1;
+
+	if (!ASSERT_OK(pipe(pipefd), "pipe"))
+		goto cleanup;
+
+	pid = fork();
+	if (pid == 0) {
+		/* Child: spawn multiple threads to consume multiple CPUs */
+		pthread_t threads[NUM_THR];
+		int i;
+
+		close(pipefd[1]);
+		read(pipefd[0], &buf, 1);
+		close(pipefd[0]);
+
+		for (i = 0; i < NUM_THR; i++)
+			pthread_create(&threads[i], NULL, nmi_cpu_worker, NULL);
+		for (i = 0; i < NUM_THR; i++)
+			pthread_join(threads[i], NULL);
+		exit(0);
+	}
+
+	if (!ASSERT_GE(pid, 0, "fork"))
+		goto cleanup;
+
+	/* Open perf event for child process across all CPUs */
+	pe_fd = perf_event_open(PERF_TYPE_HARDWARE,
+				PERF_COUNT_HW_CPU_CYCLES,
+				pid,  /* measure child process */
+				-1);  /* on any CPU */
+	if (pe_fd < 0) {
+		if (errno == ENOENT || errno == EOPNOTSUPP) {
+			printf("SKIP:no PERF_COUNT_HW_CPU_CYCLES\n");
+			test__skip();
+			ret = EOPNOTSUPP;
+			goto cleanup;
+		}
+		ASSERT_GE(pe_fd, 0, "perf_event_open");
+		goto cleanup;
+	}
+
+	link = bpf_program__attach_perf_event(prog, pe_fd);
+	if (!ASSERT_OK_PTR(link, "attach_perf_event"))
+		goto cleanup;
+	pe_fd = -1;  /* Ownership transferred to link */
+
+	/* Signal child to start CPU work */
+	close(pipefd[0]);
+	pipefd[0] = -1;
+	write(pipefd[1], &buf, 1);
+	close(pipefd[1]);
+	pipefd[1] = -1;
+
+	waitpid(pid, &status, 0);
+	pid = 0;
+
+	/* Verify NMI context was hit */
+	ASSERT_GT(timer_skel->bss->test_hits, 0, "test_hits");
+	ret = 0;
+
+cleanup:
+	bpf_link__destroy(link);
+	if (pe_fd >= 0)
+		close(pe_fd);
+	if (pid > 0) {
+		write(pipefd[1], &buf, 1);
+		waitpid(pid, &status, 0);
+	}
+	if (pipefd[0] >= 0)
+		close(pipefd[0]);
+	if (pipefd[1] >= 0)
+		close(pipefd[1]);
+	return ret;
+}
+
+static int timer_stress_nmi_race(struct timer *timer_skel)
+{
+	int err;
+
+	err = run_nmi_test(timer_skel, timer_skel->progs.nmi_race);
+	if (err == EOPNOTSUPP)
+		return 0;
+	return err;
+}
+
+static int timer_stress_nmi_update(struct timer *timer_skel)
+{
+	int err;
+
+	err = run_nmi_test(timer_skel, timer_skel->progs.nmi_update);
+	if (err == EOPNOTSUPP)
+		return 0;
+	if (err)
+		return err;
+	ASSERT_GT(timer_skel->bss->update_hits, 0, "update_hits");
+	return 0;
+}
+
+static int timer_stress_nmi_cancel(struct timer *timer_skel)
+{
+	int err;
+
+	err = run_nmi_test(timer_skel, timer_skel->progs.nmi_cancel);
+	if (err == EOPNOTSUPP)
+		return 0;
+	if (err)
+		return err;
+	ASSERT_GT(timer_skel->bss->cancel_hits, 0, "cancel_hits");
+	return 0;
+}
+
 static int timer(struct timer *timer_skel)
 {
 	int err, prog_fd;
@@ -159,6 +302,21 @@ void serial_test_timer_async_cancel(void)
 	test_timer(timer_cancel_async);
 }
 
+void serial_test_timer_stress_nmi_race(void)
+{
+	test_timer(timer_stress_nmi_race);
+}
+
+void serial_test_timer_stress_nmi_update(void)
+{
+	test_timer(timer_stress_nmi_update);
+}
+
+void serial_test_timer_stress_nmi_cancel(void)
+{
+	test_timer(timer_stress_nmi_cancel);
+}
+
 void test_timer_interrupt(void)
 {
 	struct timer_interrupt *skel = NULL;
diff --git a/tools/testing/selftests/bpf/progs/timer.c b/tools/testing/selftests/bpf/progs/timer.c
index 4b4ca781e7cd..d6d5fefcd9b1 100644
--- a/tools/testing/selftests/bpf/progs/timer.c
+++ b/tools/testing/selftests/bpf/progs/timer.c
@@ -63,6 +63,9 @@ __u64 bss_data;
 __u64 abs_data;
 __u64 err;
 __u64 ok;
+__u64 test_hits;
+__u64 update_hits;
+__u64 cancel_hits;
 __u64 callback_check = 52;
 __u64 callback2_check = 52;
 __u64 pinned_callback_check;
@@ -427,30 +430,88 @@ static int race_timer_callback(void *race_array, int *race_key, struct bpf_timer
 	return 0;
 }
 
-SEC("syscall")
-int race(void *ctx)
+/* Callback that updates its own map element */
+static int update_self_callback(void *map, int *key, struct bpf_timer *timer)
+{
+	struct elem init = {};
+
+	bpf_map_update_elem(map, key, &init, BPF_ANY);
+	__sync_fetch_and_add(&update_hits, 1);
+	return 0;
+}
+
+/* Callback that cancels itself using async cancel */
+static int cancel_self_callback(void *map, int *key, struct bpf_timer *timer)
+{
+	bpf_timer_cancel_async(timer);
+	__sync_fetch_and_add(&cancel_hits, 1);
+	return 0;
+}
+
+enum test_mode {
+	TEST_RACE_SYNC,
+	TEST_RACE_ASYNC,
+	TEST_UPDATE,
+	TEST_CANCEL,
+};
+
+static __always_inline int test_common(enum test_mode mode)
 {
 	struct bpf_timer *timer;
-	int err, race_key = 0;
 	struct elem init;
+	int ret, key = 0;
 
 	__builtin_memset(&init, 0, sizeof(struct elem));
-	bpf_map_update_elem(&race_array, &race_key, &init, BPF_ANY);
 
-	timer = bpf_map_lookup_elem(&race_array, &race_key);
+	bpf_map_update_elem(&race_array, &key, &init, BPF_ANY);
+	timer = bpf_map_lookup_elem(&race_array, &key);
 	if (!timer)
-		return 1;
+		return 0;
+
+	ret = bpf_timer_init(timer, &race_array, CLOCK_MONOTONIC);
+	if (ret && ret != -EBUSY)
+		return 0;
 
-	err = bpf_timer_init(timer, &race_array, CLOCK_MONOTONIC);
-	if (err && err != -EBUSY)
-		return 1;
+	if (mode == TEST_RACE_SYNC || mode == TEST_RACE_ASYNC)
+		bpf_timer_set_callback(timer, race_timer_callback);
+	else if (mode == TEST_UPDATE)
+		bpf_timer_set_callback(timer, update_self_callback);
+	else
+		bpf_timer_set_callback(timer, cancel_self_callback);
 
-	bpf_timer_set_callback(timer, race_timer_callback);
 	bpf_timer_start(timer, 0, 0);
-	if (async_cancel)
+
+	if (mode == TEST_RACE_ASYNC)
 		bpf_timer_cancel_async(timer);
-	else
+	else if (mode == TEST_RACE_SYNC)
 		bpf_timer_cancel(timer);
 
 	return 0;
 }
+
+SEC("syscall")
+int race(void *ctx)
+{
+	return test_common(async_cancel ? TEST_RACE_ASYNC : TEST_RACE_SYNC);
+}
+
+SEC("perf_event")
+int nmi_race(void *ctx)
+{
+	__sync_fetch_and_add(&test_hits, 1);
+	return test_common(TEST_RACE_ASYNC);
+}
+
+SEC("perf_event")
+int nmi_update(void *ctx)
+{
+	__sync_fetch_and_add(&test_hits, 1);
+	return test_common(TEST_UPDATE);
+}
+
+SEC("perf_event")
+int nmi_cancel(void *ctx)
+{
+	__sync_fetch_and_add(&test_hits, 1);
+	return test_common(TEST_CANCEL);
+}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v9 bpf-next 8/9] selftests/bpf: Removed obsolete tests
  2026-02-01  2:53 [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq Alexei Starovoitov
                   ` (6 preceding siblings ...)
  2026-02-01  2:54 ` [PATCH v9 bpf-next 7/9] selftests/bpf: Add timer stress test in NMI context Alexei Starovoitov
@ 2026-02-01  2:54 ` Alexei Starovoitov
  2026-02-01  2:54 ` [PATCH v9 bpf-next 9/9] selftests/bpf: Add a test to stress bpf_timer_start and map_delete race Alexei Starovoitov
  2026-02-04  1:10 ` [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq patchwork-bot+netdevbpf
  9 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2026-02-01  2:54 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, martin.lau, memxor, mykyta.yatsenko5, kernel-team

From: Mykyta Yatsenko <yatsenko@meta.com>

Now bpf_timer can be used in tracepoints, so these tests are no longer
relevant.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 .../bpf/progs/verifier_helper_restricted.c    | 111 ------------------
 1 file changed, 111 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/verifier_helper_restricted.c b/tools/testing/selftests/bpf/progs/verifier_helper_restricted.c
index 059aa716e3d0..889c9b78b912 100644
--- a/tools/testing/selftests/bpf/progs/verifier_helper_restricted.c
+++ b/tools/testing/selftests/bpf/progs/verifier_helper_restricted.c
@@ -17,17 +17,6 @@ struct {
 	__type(value, struct val);
 } map_spin_lock SEC(".maps");
 
-struct timer {
-	struct bpf_timer t;
-};
-
-struct {
-	__uint(type, BPF_MAP_TYPE_ARRAY);
-	__uint(max_entries, 1);
-	__type(key, int);
-	__type(value, struct timer);
-} map_timer SEC(".maps");
-
 SEC("kprobe")
 __description("bpf_ktime_get_coarse_ns is forbidden in BPF_PROG_TYPE_KPROBE")
 __failure __msg("program of this type cannot use helper bpf_ktime_get_coarse_ns")
@@ -84,106 +73,6 @@ __naked void bpf_prog_type_raw_tracepoint_1(void)
 	: __clobber_all);
 }
 
-SEC("kprobe")
-__description("bpf_timer_init isn restricted in BPF_PROG_TYPE_KPROBE")
-__failure __msg("tracing progs cannot use bpf_timer yet")
-__naked void in_bpf_prog_type_kprobe_2(void)
-{
-	asm volatile ("					\
-	r2 = r10;					\
-	r2 += -8;					\
-	r1 = 0;						\
-	*(u64*)(r2 + 0) = r1;				\
-	r1 = %[map_timer] ll;				\
-	call %[bpf_map_lookup_elem];			\
-	if r0 == 0 goto l0_%=;				\
-	r1 = r0;					\
-	r2 = %[map_timer] ll;				\
-	r3 = 1;						\
-l0_%=:	call %[bpf_timer_init];				\
-	exit;						\
-"	:
-	: __imm(bpf_map_lookup_elem),
-	  __imm(bpf_timer_init),
-	  __imm_addr(map_timer)
-	: __clobber_all);
-}
-
-SEC("perf_event")
-__description("bpf_timer_init is forbidden in BPF_PROG_TYPE_PERF_EVENT")
-__failure __msg("tracing progs cannot use bpf_timer yet")
-__naked void bpf_prog_type_perf_event_2(void)
-{
-	asm volatile ("					\
-	r2 = r10;					\
-	r2 += -8;					\
-	r1 = 0;						\
-	*(u64*)(r2 + 0) = r1;				\
-	r1 = %[map_timer] ll;				\
-	call %[bpf_map_lookup_elem];			\
-	if r0 == 0 goto l0_%=;				\
-	r1 = r0;					\
-	r2 = %[map_timer] ll;				\
-	r3 = 1;						\
-l0_%=:	call %[bpf_timer_init];				\
-	exit;						\
-"	:
-	: __imm(bpf_map_lookup_elem),
-	  __imm(bpf_timer_init),
-	  __imm_addr(map_timer)
-	: __clobber_all);
-}
-
-SEC("tracepoint")
-__description("bpf_timer_init is forbidden in BPF_PROG_TYPE_TRACEPOINT")
-__failure __msg("tracing progs cannot use bpf_timer yet")
-__naked void in_bpf_prog_type_tracepoint_2(void)
-{
-	asm volatile ("					\
-	r2 = r10;					\
-	r2 += -8;					\
-	r1 = 0;						\
-	*(u64*)(r2 + 0) = r1;				\
-	r1 = %[map_timer] ll;				\
-	call %[bpf_map_lookup_elem];			\
-	if r0 == 0 goto l0_%=;				\
-	r1 = r0;					\
-	r2 = %[map_timer] ll;				\
-	r3 = 1;						\
-l0_%=:	call %[bpf_timer_init];				\
-	exit;						\
-"	:
-	: __imm(bpf_map_lookup_elem),
-	  __imm(bpf_timer_init),
-	  __imm_addr(map_timer)
-	: __clobber_all);
-}
-
-SEC("raw_tracepoint")
-__description("bpf_timer_init is forbidden in BPF_PROG_TYPE_RAW_TRACEPOINT")
-__failure __msg("tracing progs cannot use bpf_timer yet")
-__naked void bpf_prog_type_raw_tracepoint_2(void)
-{
-	asm volatile ("					\
-	r2 = r10;					\
-	r2 += -8;					\
-	r1 = 0;						\
-	*(u64*)(r2 + 0) = r1;				\
-	r1 = %[map_timer] ll;				\
-	call %[bpf_map_lookup_elem];			\
-	if r0 == 0 goto l0_%=;				\
-	r1 = r0;					\
-	r2 = %[map_timer] ll;				\
-	r3 = 1;						\
-l0_%=:	call %[bpf_timer_init];				\
-	exit;						\
-"	:
-	: __imm(bpf_map_lookup_elem),
-	  __imm(bpf_timer_init),
-	  __imm_addr(map_timer)
-	: __clobber_all);
-}
-
 SEC("kprobe")
 __description("bpf_spin_lock is forbidden in BPF_PROG_TYPE_KPROBE")
 __failure __msg("tracing progs cannot use bpf_spin_lock yet")
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v9 bpf-next 9/9] selftests/bpf: Add a test to stress bpf_timer_start and map_delete race
  2026-02-01  2:53 [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq Alexei Starovoitov
                   ` (7 preceding siblings ...)
  2026-02-01  2:54 ` [PATCH v9 bpf-next 8/9] selftests/bpf: Removed obsolete tests Alexei Starovoitov
@ 2026-02-01  2:54 ` Alexei Starovoitov
  2026-02-01  3:15   ` bot+bpf-ci
  2026-02-04  1:10 ` [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq patchwork-bot+netdevbpf
  9 siblings, 1 reply; 21+ messages in thread
From: Alexei Starovoitov @ 2026-02-01  2:54 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, martin.lau, memxor, mykyta.yatsenko5, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Add a test to stress bpf_timer_start and map_delete race

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 .../bpf/prog_tests/timer_start_delete_race.c  | 137 ++++++++++++++++++
 .../bpf/progs/timer_start_delete_race.c       |  66 +++++++++
 2 files changed, 203 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/timer_start_delete_race.c
 create mode 100644 tools/testing/selftests/bpf/progs/timer_start_delete_race.c

diff --git a/tools/testing/selftests/bpf/prog_tests/timer_start_delete_race.c b/tools/testing/selftests/bpf/prog_tests/timer_start_delete_race.c
new file mode 100644
index 000000000000..29a46e96f660
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/timer_start_delete_race.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+#define _GNU_SOURCE
+#include <sched.h>
+#include <pthread.h>
+#include <test_progs.h>
+#include "timer_start_delete_race.skel.h"
+
+/*
+ * Test for race between bpf_timer_start() and map element deletion.
+ *
+ * The race scenario:
+ * - CPU 1: bpf_timer_start() proceeds to bpf_async_process() and is about
+ *          to call hrtimer_start() but hasn't yet
+ * - CPU 2: map_delete_elem() calls __bpf_async_cancel_and_free(), since
+ *          timer is not scheduled yet hrtimer_try_to_cancel() is a nop,
+ *          then calls bpf_async_refcount_put() dropping refcnt to zero
+ *          and scheduling call_rcu_tasks_trace()
+ * - CPU 1: continues and calls hrtimer_start()
+ * - After RCU tasks trace grace period: memory is freed
+ * - Timer callback fires on freed memory: UAF!
+ *
+ * This test stresses this race by having two threads:
+ * - Thread 1: repeatedly starts timers
+ * - Thread 2: repeatedly deletes map elements
+ *
+ * KASAN should detect use-after-free.
+ */
+
+#define ITERATIONS 1000
+
+struct ctx {
+	struct timer_start_delete_race *skel;
+	volatile bool start;
+	volatile bool stop;
+	int errors;
+};
+
+static void *start_timer_thread(void *arg)
+{
+	struct ctx *ctx = arg;
+	cpu_set_t cpuset;
+	int fd, i;
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(0, &cpuset);
+	pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
+
+	while (!ctx->start && !ctx->stop)
+		usleep(1);
+	if (ctx->stop)
+		return NULL;
+
+	fd = bpf_program__fd(ctx->skel->progs.start_timer);
+
+	for (i = 0; i < ITERATIONS && !ctx->stop; i++) {
+		LIBBPF_OPTS(bpf_test_run_opts, opts);
+		int err;
+
+		err = bpf_prog_test_run_opts(fd, &opts);
+		if (err || opts.retval) {
+			ctx->errors++;
+			break;
+		}
+	}
+
+	return NULL;
+}
+
+static void *delete_elem_thread(void *arg)
+{
+	struct ctx *ctx = arg;
+	cpu_set_t cpuset;
+	int fd, i;
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(1, &cpuset);
+	pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
+
+	while (!ctx->start && !ctx->stop)
+		usleep(1);
+	if (ctx->stop)
+		return NULL;
+
+	fd = bpf_program__fd(ctx->skel->progs.delete_elem);
+
+	for (i = 0; i < ITERATIONS && !ctx->stop; i++) {
+		LIBBPF_OPTS(bpf_test_run_opts, opts);
+		int err;
+
+		err = bpf_prog_test_run_opts(fd, &opts);
+		if (err || opts.retval) {
+			ctx->errors++;
+			break;
+		}
+	}
+
+	return NULL;
+}
+
+void test_timer_start_delete_race(void)
+{
+	struct timer_start_delete_race *skel;
+	pthread_t threads[2];
+	struct ctx ctx = {};
+	int err;
+
+	skel = timer_start_delete_race__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+		return;
+
+	ctx.skel = skel;
+
+	err = pthread_create(&threads[0], NULL, start_timer_thread, &ctx);
+	if (!ASSERT_OK(err, "create start_timer_thread")) {
+		ctx.stop = true;
+		goto cleanup;
+	}
+
+	err = pthread_create(&threads[1], NULL, delete_elem_thread, &ctx);
+	if (!ASSERT_OK(err, "create delete_elem_thread")) {
+		ctx.stop = true;
+		pthread_join(threads[0], NULL);
+		goto cleanup;
+	}
+
+	ctx.start = true;
+
+	pthread_join(threads[0], NULL);
+	pthread_join(threads[1], NULL);
+
+	ASSERT_EQ(ctx.errors, 0, "thread_errors");
+
+	/* Either KASAN will catch UAF or kernel will crash or nothing happens */
+cleanup:
+	timer_start_delete_race__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/timer_start_delete_race.c b/tools/testing/selftests/bpf/progs/timer_start_delete_race.c
new file mode 100644
index 000000000000..2ff6357f33f9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/timer_start_delete_race.c
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */
+#include <linux/bpf.h>
+#include <time.h>
+#include <bpf/bpf_helpers.h>
+
+#define ITER_CNT 2000
+
+struct map_value {
+	struct bpf_timer timer;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, int);
+	__type(value, struct map_value);
+	__uint(max_entries, 1);
+} timer_map SEC(".maps");
+
+long cb_cnt;
+
+/*
+ * Timer callback that accesses the map value.
+ * If the race bug exists and this runs on freed memory,
+ * KASAN should detect it.
+ */
+static int timer_cb(void *map, int *key, struct map_value *value)
+{
+	__sync_fetch_and_add(&cb_cnt, 1);
+	return 0;
+}
+
+SEC("syscall")
+int start_timer(void *ctx)
+{
+	struct map_value *value;
+	int i;
+
+	for (i = 0; i < ITER_CNT; i++) {
+		int key = 0;
+
+		value = bpf_map_lookup_elem(&timer_map, &key);
+		/* hey claude, no if (!value) here. The verifier got smarter */
+
+		bpf_timer_init(&value->timer, &timer_map, CLOCK_MONOTONIC);
+		bpf_timer_set_callback(&value->timer, timer_cb);
+		bpf_timer_start(&value->timer, 100000000, 0);
+	}
+	return 0;
+}
+
+SEC("syscall")
+int delete_elem(void *ctx)
+{
+	int i;
+
+	for (i = 0; i < ITER_CNT; i++) {
+		int key = 0;
+
+		bpf_map_delete_elem(&timer_map, &key);
+	}
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 bpf-next 9/9] selftests/bpf: Add a test to stress bpf_timer_start and map_delete race
  2026-02-01  2:54 ` [PATCH v9 bpf-next 9/9] selftests/bpf: Add a test to stress bpf_timer_start and map_delete race Alexei Starovoitov
@ 2026-02-01  3:15   ` bot+bpf-ci
  2026-02-01  3:30     ` Alexei Starovoitov
  0 siblings, 1 reply; 21+ messages in thread
From: bot+bpf-ci @ 2026-02-01  3:15 UTC (permalink / raw)
  To: alexei.starovoitov, bpf
  Cc: daniel, andrii, martin.lau, memxor, mykyta.yatsenko5, kernel-team,
	ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 1958 bytes --]

> diff --git a/tools/testing/selftests/bpf/progs/timer_start_delete_race.c b/tools/testing/selftests/bpf/progs/timer_start_delete_race.c
> new file mode 100644
> index 000000000..2ff6357f3
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/timer_start_delete_race.c

[ ... ]

> +SEC("syscall")
> +int start_timer(void *ctx)
> +{
> +	struct map_value *value;
> +	int i;
> +
> +	for (i = 0; i < ITER_CNT; i++) {
> +		int key = 0;
> +
> +		value = bpf_map_lookup_elem(&timer_map, &key);
> +		/* hey claude, no if (!value) here. The verifier got smarter */
> +
> +		bpf_timer_init(&value->timer, &timer_map, CLOCK_MONOTONIC);

Will this program load?  bpf_map_lookup_elem() returns PTR_TO_MAP_VALUE_OR_NULL
(defined in bpf_map_lookup_elem_proto in kernel/bpf/helpers.c), which requires
a NULL check before dereferencing.

The comment claims "the verifier got smarter", but I don't see any verifier
changes in this series that would eliminate the NULL check requirement for
bpf_map_lookup_elem().  All other timer tests in this series maintain the
NULL check:

    timer = bpf_map_lookup_elem(&race_array, &key);
    if (!timer)
        return 0;

Without the NULL check, the verifier should reject this program with an
error about dereferencing PTR_TO_MAP_VALUE_OR_NULL.

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/21555452257

AI-authorship-score: low
AI-authorship-explanation: The code follows standard kernel test patterns with typical C style; the unusual 'hey claude' comment is more likely a developer joke than AI-generated content.
issues-found: 1
issue-severity-score: medium
issue-severity-explanation: BPF program will fail to load due to missing NULL check after bpf_map_lookup_elem(), causing the test to fail silently or with a verifier rejection error.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 bpf-next 2/9] bpf: Add verifier support for bpf_timer argument in kfuncs
  2026-02-01  2:53 ` [PATCH v9 bpf-next 2/9] bpf: Add verifier support for bpf_timer argument in kfuncs Alexei Starovoitov
@ 2026-02-01  3:15   ` bot+bpf-ci
  0 siblings, 0 replies; 21+ messages in thread
From: bot+bpf-ci @ 2026-02-01  3:15 UTC (permalink / raw)
  To: alexei.starovoitov, bpf
  Cc: daniel, andrii, martin.lau, memxor, mykyta.yatsenko5, kernel-team,
	ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 1941 bytes --]

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 6b62b6d57..9b1853fec 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c

[ ... ]

> @@ -21294,20 +21327,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
>  		}
>  	}
>
> -	if (btf_record_has_field(map->record, BPF_TIMER)) {
> -		if (is_tracing_prog_type(prog_type)) {
> -			verbose(env, "tracing progs cannot use bpf_timer yet\n");
> -			return -EINVAL;
> -		}
> -	}
> -
> -	if (btf_record_has_field(map->record, BPF_WORKQUEUE)) {
> -		if (is_tracing_prog_type(prog_type)) {
> -			verbose(env, "tracing progs cannot use bpf_wq yet\n");
> -			return -EINVAL;
> -		}
> -	}

The commit message describes adding kfunc timer argument support, but doesn't
mention this removal of restrictions that previously prevented tracing programs
(kprobe, fentry, fexit, etc.) from using bpf_timer and bpf_wq.

This appears related to the previous commit fac023087 ("bpf: Enable bpf_timer
and bpf_wq in any context") which made the runtime changes necessary for this.
Should the commit message mention that this change also enables tracing
programs to use timers and workqueues, or would it be clearer as a separate
commit?

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/21555452257

AI-authorship-score: low
AI-authorship-explanation: The code follows established kernel patterns exactly and shows familiarity with the BPF verifier codebase, consistent with human-written kernel code from a known BPF contributor.
issues-found: 1
issue-severity-score: low
issue-severity-explanation: The commit message doesn't mention the removal of tracing program restrictions for bpf_timer and bpf_wq, which is a documentation completeness issue rather than a functional bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 bpf-next 9/9] selftests/bpf: Add a test to stress bpf_timer_start and map_delete race
  2026-02-01  3:15   ` bot+bpf-ci
@ 2026-02-01  3:30     ` Alexei Starovoitov
  0 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2026-02-01  3:30 UTC (permalink / raw)
  To: bot+bpf-ci
  Cc: bpf, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Kernel Team,
	Alexei Starovoitov, Eduard, Yonghong Song, Chris Mason,
	Ihor Solodrai

On Sat, Jan 31, 2026 at 7:15 PM <bot+bpf-ci@kernel.org> wrote:
>
> > diff --git a/tools/testing/selftests/bpf/progs/timer_start_delete_race.c b/tools/testing/selftests/bpf/progs/timer_start_delete_race.c
> > new file mode 100644
> > index 000000000..2ff6357f3
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/timer_start_delete_race.c
>
> [ ... ]
>
> > +SEC("syscall")
> > +int start_timer(void *ctx)
> > +{
> > +     struct map_value *value;
> > +     int i;
> > +
> > +     for (i = 0; i < ITER_CNT; i++) {
> > +             int key = 0;
> > +
> > +             value = bpf_map_lookup_elem(&timer_map, &key);
> > +             /* hey claude, no if (!value) here. The verifier got smarter */
> > +
> > +             bpf_timer_init(&value->timer, &timer_map, CLOCK_MONOTONIC);
>
> Will this program load?  bpf_map_lookup_elem() returns PTR_TO_MAP_VALUE_OR_NULL
> (defined in bpf_map_lookup_elem_proto in kernel/bpf/helpers.c), which requires
> a NULL check before dereferencing.
>
> The comment claims "the verifier got smarter", but I don't see any verifier
> changes in this series that would eliminate the NULL check requirement for
> bpf_map_lookup_elem().  All other timer tests in this series maintain the
> NULL check:
>
>     timer = bpf_map_lookup_elem(&race_array, &key);
>     if (!timer)
>         return 0;
>
> Without the NULL check, the verifier should reject this program with an
> error about dereferencing PTR_TO_MAP_VALUE_OR_NULL.

Ohh. Too bad. claude didn't trust the comment :)
It's a good thing that it didn't... of course.
We can tweak review prompts, but ideally we can document this
inside source code somehow.
Ideas?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context
  2026-02-01  2:53 ` [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context Alexei Starovoitov
@ 2026-02-02 13:36   ` Mykyta Yatsenko
  2026-02-02 17:29     ` Alexei Starovoitov
  2026-02-03 22:14   ` Kumar Kartikeya Dwivedi
  2026-02-03 23:53   ` Andrii Nakryiko
  2 siblings, 1 reply; 21+ messages in thread
From: Mykyta Yatsenko @ 2026-02-02 13:36 UTC (permalink / raw)
  To: Alexei Starovoitov, bpf; +Cc: daniel, andrii, martin.lau, memxor, kernel-team

On 2/1/26 02:53, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
>
> Refactor bpf_timer and bpf_wq to allow calling them from any context:
> - add refcnt to bpf_async_cb
> - map_delete_elem or map_free will drop refcnt to zero
>    via bpf_async_cancel_and_free()
> - once refcnt is zero timer/wq_start is not allowed to make sure
>    that callback cannot rearm itself
> - if in_hardirq defer to start/cancel operations to irq_work
>
> Co-developed-by: Mykyta Yatsenko <yatsenko@meta.com>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>   kernel/bpf/helpers.c | 408 ++++++++++++++++++++++++-------------------
>   1 file changed, 225 insertions(+), 183 deletions(-)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index b54ec0e945aa..2eb262d52232 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1095,16 +1095,34 @@ static void *map_key_from_value(struct bpf_map *map, void *value, u32 *arr_idx)
>   	return (void *)value - round_up(map->key_size, 8);
>   }
>   
> +enum bpf_async_type {
> +	BPF_ASYNC_TYPE_TIMER = 0,
> +	BPF_ASYNC_TYPE_WQ,
> +};
> +
> +enum bpf_async_op {
> +	BPF_ASYNC_START,
> +	BPF_ASYNC_CANCEL
> +};
> +
> +struct bpf_async_cmd {
> +	struct llist_node node;
> +	u64 nsec;
> +	u32 mode;
> +	enum bpf_async_op op;
> +};
> +
>   struct bpf_async_cb {
>   	struct bpf_map *map;
>   	struct bpf_prog *prog;
>   	void __rcu *callback_fn;
>   	void *value;
> -	union {
> -		struct rcu_head rcu;
> -		struct work_struct delete_work;
> -	};
> +	struct rcu_head rcu;
>   	u64 flags;
> +	struct irq_work worker;
> +	refcount_t refcnt;
> +	enum bpf_async_type type;
> +	struct llist_head async_cmds;
>   };
>   
>   /* BPF map elements can contain 'struct bpf_timer'.
> @@ -1132,7 +1150,6 @@ struct bpf_hrtimer {
>   struct bpf_work {
>   	struct bpf_async_cb cb;
>   	struct work_struct work;
> -	struct work_struct delete_work;
>   };
>   
>   /* the actual struct hidden inside uapi struct bpf_timer and bpf_wq */
> @@ -1142,20 +1159,12 @@ struct bpf_async_kern {
>   		struct bpf_hrtimer *timer;
>   		struct bpf_work *work;
>   	};
> -	/* bpf_spin_lock is used here instead of spinlock_t to make
> -	 * sure that it always fits into space reserved by struct bpf_timer
> -	 * regardless of LOCKDEP and spinlock debug flags.
> -	 */
> -	struct bpf_spin_lock lock;
>   } __attribute__((aligned(8)));
>   
> -enum bpf_async_type {
> -	BPF_ASYNC_TYPE_TIMER = 0,
> -	BPF_ASYNC_TYPE_WQ,
> -};
> -
>   static DEFINE_PER_CPU(struct bpf_hrtimer *, hrtimer_running);
>   
> +static void bpf_async_refcount_put(struct bpf_async_cb *cb);
> +
>   static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
>   {
>   	struct bpf_hrtimer *t = container_of(hrtimer, struct bpf_hrtimer, timer);
> @@ -1219,45 +1228,73 @@ static void bpf_async_cb_rcu_free(struct rcu_head *rcu)
>   {
>   	struct bpf_async_cb *cb = container_of(rcu, struct bpf_async_cb, rcu);
>   
> +	/*
> +	 * Drop the last reference to prog only after RCU GP, as set_callback()
> +	 * may race with cancel_and_free()
> +	 */
> +	if (cb->prog)
> +		bpf_prog_put(cb->prog);
> +
>   	kfree_nolock(cb);
>   }
>   
> -static void bpf_wq_delete_work(struct work_struct *work)
> +/* Callback from call_rcu_tasks_trace, chains to call_rcu for final free */
> +static void bpf_async_cb_rcu_tasks_trace_free(struct rcu_head *rcu)
>   {
> -	struct bpf_work *w = container_of(work, struct bpf_work, delete_work);
> +	struct bpf_async_cb *cb = container_of(rcu, struct bpf_async_cb, rcu);
> +	struct bpf_hrtimer *t = container_of(cb, struct bpf_hrtimer, cb);
> +	struct bpf_work *w = container_of(cb, struct bpf_work, cb);
> +	bool retry = false;
>   
> -	cancel_work_sync(&w->work);
> +	/*
> +	 * bpf_async_cancel_and_free() tried to cancel timer/wq, but it
> +	 * could have raced with timer/wq_start. Now refcnt is zero and
> +	 * srcu/rcu GP completed. Cancel timer/wq again.
> +	 */
> +	switch (cb->type) {
> +	case BPF_ASYNC_TYPE_TIMER:
> +		if (hrtimer_try_to_cancel(&t->timer) < 0)
> +			retry = true;
> +		break;
> +	case BPF_ASYNC_TYPE_WQ:
> +		if (!cancel_work(&w->work))
> +			retry = true;
> +		break;
> +	}
> +	if (retry) {
Isn't it the case that both timer and workqueue callbacks imply rcu locks?
What scenario I'm not accounting for, thinking we can't get here?
> +		/*
> +		 * hrtimer or wq callback may still be running. It must be
> +		 * in rcu_tasks_trace or rcu CS, so wait for GP again.
> +		 * It won't retry forever, since refcnt zero prevents all
> +		 * operations on timer/wq.
> +		 */
> +		call_rcu_tasks_trace(&cb->rcu, bpf_async_cb_rcu_tasks_trace_free);
> +		return;
> +	}
>   
> -	call_rcu(&w->cb.rcu, bpf_async_cb_rcu_free);
> +	/* rcu_trace_implies_rcu_gp() is true and will remain so */
> +	bpf_async_cb_rcu_free(rcu);
>   }
>   
> -static void bpf_timer_delete_work(struct work_struct *work)
> +static void bpf_async_refcount_put(struct bpf_async_cb *cb)
>   {
> -	struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, cb.delete_work);
> +	if (!refcount_dec_and_test(&cb->refcnt))
> +		return;
>   
> -	/* Cancel the timer and wait for callback to complete if it was running.
> -	 * If hrtimer_cancel() can be safely called it's safe to call
> -	 * call_rcu() right after for both preallocated and non-preallocated
> -	 * maps.  The async->cb = NULL was already done and no code path can see
> -	 * address 't' anymore. Timer if armed for existing bpf_hrtimer before
> -	 * bpf_timer_cancel_and_free will have been cancelled.
> -	 */
> -	hrtimer_cancel(&t->timer);
> -	call_rcu(&t->cb.rcu, bpf_async_cb_rcu_free);
> +	call_rcu_tasks_trace(&cb->rcu, bpf_async_cb_rcu_tasks_trace_free);
>   }
>   
> +static void bpf_async_cancel_and_free(struct bpf_async_kern *async);
> +static void bpf_async_irq_worker(struct irq_work *work);
> +
>   static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u64 flags,
>   			    enum bpf_async_type type)
>   {
> -	struct bpf_async_cb *cb;
> +	struct bpf_async_cb *cb, *old_cb;
>   	struct bpf_hrtimer *t;
>   	struct bpf_work *w;
>   	clockid_t clockid;
>   	size_t size;
> -	int ret = 0;
> -
> -	if (in_nmi())
> -		return -EOPNOTSUPP;
>   
>   	switch (type) {
>   	case BPF_ASYNC_TYPE_TIMER:
> @@ -1270,18 +1307,13 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
>   		return -EINVAL;
>   	}
>   
> -	__bpf_spin_lock_irqsave(&async->lock);
> -	t = async->timer;
> -	if (t) {
> -		ret = -EBUSY;
> -		goto out;
> -	}
> +	old_cb = READ_ONCE(async->cb);
> +	if (old_cb)
> +		return -EBUSY;
>   
>   	cb = bpf_map_kmalloc_nolock(map, size, 0, map->numa_node);
> -	if (!cb) {
> -		ret = -ENOMEM;
> -		goto out;
> -	}
> +	if (!cb)
> +		return -ENOMEM;
>   
>   	switch (type) {
>   	case BPF_ASYNC_TYPE_TIMER:
> @@ -1289,7 +1321,6 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
>   		t = (struct bpf_hrtimer *)cb;
>   
>   		atomic_set(&t->cancelling, 0);
> -		INIT_WORK(&t->cb.delete_work, bpf_timer_delete_work);
>   		hrtimer_setup(&t->timer, bpf_timer_cb, clockid, HRTIMER_MODE_REL_SOFT);
>   		cb->value = (void *)async - map->record->timer_off;
>   		break;
> @@ -1297,16 +1328,24 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
>   		w = (struct bpf_work *)cb;
>   
>   		INIT_WORK(&w->work, bpf_wq_work);
> -		INIT_WORK(&w->delete_work, bpf_wq_delete_work);
>   		cb->value = (void *)async - map->record->wq_off;
>   		break;
>   	}
>   	cb->map = map;
>   	cb->prog = NULL;
>   	cb->flags = flags;
> +	cb->worker = IRQ_WORK_INIT(bpf_async_irq_worker);
> +	init_llist_head(&cb->async_cmds);
> +	refcount_set(&cb->refcnt, 1); /* map's reference */
> +	cb->type = type;
>   	rcu_assign_pointer(cb->callback_fn, NULL);
>   
> -	WRITE_ONCE(async->cb, cb);
> +	old_cb = cmpxchg(&async->cb, NULL, cb);
> +	if (old_cb) {
> +		/* Lost the race to initialize this bpf_async_kern, drop the allocated object */
> +		kfree_nolock(cb);
> +		return -EBUSY;
> +	}
>   	/* Guarantee the order between async->cb and map->usercnt. So
>   	 * when there are concurrent uref release and bpf timer init, either
>   	 * bpf_timer_cancel_and_free() called by uref release reads a no-NULL
> @@ -1317,13 +1356,11 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
>   		/* maps with timers must be either held by user space
>   		 * or pinned in bpffs.
>   		 */
> -		WRITE_ONCE(async->cb, NULL);
> -		kfree_nolock(cb);
> -		ret = -EPERM;
> +		bpf_async_cancel_and_free(async);
> +		return -EPERM;
>   	}
> -out:
> -	__bpf_spin_unlock_irqrestore(&async->lock);
> -	return ret;
> +
> +	return 0;
>   }
>   
>   BPF_CALL_3(bpf_timer_init, struct bpf_async_kern *, timer, struct bpf_map *, map,
> @@ -1354,8 +1391,9 @@ static const struct bpf_func_proto bpf_timer_init_proto = {
>   	.arg3_type	= ARG_ANYTHING,
>   };
>   
> -static int bpf_async_update_prog_callback(struct bpf_async_cb *cb, void *callback_fn,
> -					  struct bpf_prog *prog)
> +static int bpf_async_update_prog_callback(struct bpf_async_cb *cb,
> +					  struct bpf_prog *prog,
> +					  void *callback_fn)
>   {
>   	struct bpf_prog *prev;
>   
> @@ -1380,7 +1418,8 @@ static int bpf_async_update_prog_callback(struct bpf_async_cb *cb, void *callbac
>   		if (prev)
>   			bpf_prog_put(prev);
>   
> -	} while (READ_ONCE(cb->prog) != prog || READ_ONCE(cb->callback_fn) != callback_fn);
> +	} while (READ_ONCE(cb->prog) != prog ||
> +		 (void __force *)READ_ONCE(cb->callback_fn) != callback_fn);
>   
>   	if (prog)
>   		bpf_prog_put(prog);
> @@ -1388,33 +1427,36 @@ static int bpf_async_update_prog_callback(struct bpf_async_cb *cb, void *callbac
>   	return 0;
>   }
>   
> +static int bpf_async_schedule_op(struct bpf_async_cb *cb, enum bpf_async_op op,
> +				 u64 nsec, u32 timer_mode)
> +{
> +	WARN_ON_ONCE(!in_hardirq());
> +
> +	struct bpf_async_cmd *cmd = kmalloc_nolock(sizeof(*cmd), 0, NUMA_NO_NODE);
> +
> +	if (!cmd) {
> +		bpf_async_refcount_put(cb);
> +		return -ENOMEM;
> +	}
> +	init_llist_node(&cmd->node);
> +	cmd->nsec = nsec;
> +	cmd->mode = timer_mode;
> +	cmd->op = op;
> +	if (llist_add(&cmd->node, &cb->async_cmds))
> +		irq_work_queue(&cb->worker);
> +	return 0;
> +}
> +
>   static int __bpf_async_set_callback(struct bpf_async_kern *async, void *callback_fn,
>   				    struct bpf_prog *prog)
>   {
>   	struct bpf_async_cb *cb;
> -	int ret = 0;
>   
> -	if (in_nmi())
> -		return -EOPNOTSUPP;
> -	__bpf_spin_lock_irqsave(&async->lock);
> -	cb = async->cb;
> -	if (!cb) {
> -		ret = -EINVAL;
> -		goto out;
> -	}
> -	if (!atomic64_read(&cb->map->usercnt)) {
> -		/* maps with timers must be either held by user space
> -		 * or pinned in bpffs. Otherwise timer might still be
> -		 * running even when bpf prog is detached and user space
> -		 * is gone, since map_release_uref won't ever be called.
> -		 */
> -		ret = -EPERM;
> -		goto out;
> -	}
> -	ret = bpf_async_update_prog_callback(cb, callback_fn, prog);
> -out:
> -	__bpf_spin_unlock_irqrestore(&async->lock);
> -	return ret;
> +	cb = READ_ONCE(async->cb);
> +	if (!cb)
> +		return -EINVAL;
> +
> +	return bpf_async_update_prog_callback(cb, prog, callback_fn);
>   }
>   
>   BPF_CALL_3(bpf_timer_set_callback, struct bpf_async_kern *, timer, void *, callback_fn,
> @@ -1431,22 +1473,17 @@ static const struct bpf_func_proto bpf_timer_set_callback_proto = {
>   	.arg2_type	= ARG_PTR_TO_FUNC,
>   };
>   
> -BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, timer, u64, nsecs, u64, flags)
> +BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, async, u64, nsecs, u64, flags)
>   {
>   	struct bpf_hrtimer *t;
> -	int ret = 0;
> -	enum hrtimer_mode mode;
> +	u32 mode;
>   
> -	if (in_nmi())
> -		return -EOPNOTSUPP;
>   	if (flags & ~(BPF_F_TIMER_ABS | BPF_F_TIMER_CPU_PIN))
>   		return -EINVAL;
> -	__bpf_spin_lock_irqsave(&timer->lock);
> -	t = timer->timer;
> -	if (!t || !t->cb.prog) {
> -		ret = -EINVAL;
> -		goto out;
> -	}
> +
> +	t = READ_ONCE(async->timer);
> +	if (!t || !READ_ONCE(t->cb.prog))
> +		return -EINVAL;
>   
>   	if (flags & BPF_F_TIMER_ABS)
>   		mode = HRTIMER_MODE_ABS_SOFT;
> @@ -1456,10 +1493,20 @@ BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, timer, u64, nsecs, u64, fla
>   	if (flags & BPF_F_TIMER_CPU_PIN)
>   		mode |= HRTIMER_MODE_PINNED;
>   
> -	hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> -out:
> -	__bpf_spin_unlock_irqrestore(&timer->lock);
> -	return ret;
> +	/*
> +	 * bpf_async_cancel_and_free() could have dropped refcnt to zero. In
> +	 * such case BPF progs are not allowed to arm the timer to prevent UAF.
> +	 */
> +	if (!refcount_inc_not_zero(&t->cb.refcnt))
> +		return -ENOENT;
> +
> +	if (!in_hardirq()) {
> +		hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> +		bpf_async_refcount_put(&t->cb);
> +		return 0;
> +	} else {
> +		return bpf_async_schedule_op(&t->cb, BPF_ASYNC_START, nsecs, mode);
> +	}
>   }
>   
>   static const struct bpf_func_proto bpf_timer_start_proto = {
> @@ -1477,11 +1524,9 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_async_kern *, async)
>   	bool inc = false;
>   	int ret = 0;
>   
> -	if (in_nmi())
> +	if (in_hardirq())
>   		return -EOPNOTSUPP;
>   
> -	guard(rcu)();
> -
>   	t = READ_ONCE(async->timer);
>   	if (!t)
>   		return -EINVAL;
> @@ -1536,78 +1581,85 @@ static const struct bpf_func_proto bpf_timer_cancel_proto = {
>   	.arg1_type	= ARG_PTR_TO_TIMER,
>   };
>   
> -static struct bpf_async_cb *__bpf_async_cancel_and_free(struct bpf_async_kern *async)
> +static void bpf_async_process_op(struct bpf_async_cb *cb, u32 op,
> +				 u64 timer_nsec, u32 timer_mode)
> +{
> +	switch (cb->type) {
> +	case BPF_ASYNC_TYPE_TIMER: {
> +		struct bpf_hrtimer *t = container_of(cb, struct bpf_hrtimer, cb);
> +
> +		switch (op) {
> +		case BPF_ASYNC_START:
> +			hrtimer_start(&t->timer, ns_to_ktime(timer_nsec), timer_mode);
> +			break;
> +		case BPF_ASYNC_CANCEL:
> +			hrtimer_try_to_cancel(&t->timer);
> +			break;
> +		}
> +		break;
> +	}
> +	case BPF_ASYNC_TYPE_WQ: {
> +		struct bpf_work *w = container_of(cb, struct bpf_work, cb);
> +
> +		switch (op) {
> +		case BPF_ASYNC_START:
> +			schedule_work(&w->work);
> +			break;
> +		case BPF_ASYNC_CANCEL:
> +			cancel_work(&w->work);
> +			break;
> +		}
> +		break;
> +	}
> +	}
> +	bpf_async_refcount_put(cb);
> +}
> +
> +static void bpf_async_irq_worker(struct irq_work *work)
> +{
> +	struct bpf_async_cb *cb = container_of(work, struct bpf_async_cb, worker);
> +	struct llist_node *pos, *n, *list;
> +
> +	list = llist_del_all(&cb->async_cmds);
> +	if (!list)
> +		return;
> +
> +	list = llist_reverse_order(list);
> +	llist_for_each_safe(pos, n, list) {
> +		struct bpf_async_cmd *cmd;
> +
> +		cmd = container_of(pos, struct bpf_async_cmd, node);
> +		bpf_async_process_op(cb, cmd->op, cmd->nsec, cmd->mode);
> +		kfree_nolock(cmd);
> +	}
> +}
> +
> +static void bpf_async_cancel_and_free(struct bpf_async_kern *async)
>   {
>   	struct bpf_async_cb *cb;
>   
> -	/* Performance optimization: read async->cb without lock first. */
>   	if (!READ_ONCE(async->cb))
> -		return NULL;
> +		return;
>   
> -	__bpf_spin_lock_irqsave(&async->lock);
> -	/* re-read it under lock */
> -	cb = async->cb;
> +	cb = xchg(&async->cb, NULL);
>   	if (!cb)
> -		goto out;
> -	bpf_async_update_prog_callback(cb, NULL, NULL);
> -	/* The subsequent bpf_timer_start/cancel() helpers won't be able to use
> -	 * this timer, since it won't be initialized.
> -	 */
> -	WRITE_ONCE(async->cb, NULL);
> -out:
> -	__bpf_spin_unlock_irqrestore(&async->lock);
> -	return cb;
> -}
> +		return;
>   
> -static void bpf_timer_delete(struct bpf_hrtimer *t)
> -{
>   	/*
> -	 * We check that bpf_map_delete/update_elem() was called from timer
> -	 * callback_fn. In such case we don't call hrtimer_cancel() (since it
> -	 * will deadlock) and don't call hrtimer_try_to_cancel() (since it will
> -	 * just return -1). Though callback_fn is still running on this cpu it's
> -	 * safe to do kfree(t) because bpf_timer_cb() read everything it needed
> -	 * from 't'. The bpf subprog callback_fn won't be able to access 't',
> -	 * since async->cb = NULL was already done. The timer will be
> -	 * effectively cancelled because bpf_timer_cb() will return
> -	 * HRTIMER_NORESTART.
> -	 *
> -	 * However, it is possible the timer callback_fn calling us armed the
> -	 * timer _before_ calling us, such that failing to cancel it here will
> -	 * cause it to possibly use struct hrtimer after freeing bpf_hrtimer.
> -	 * Therefore, we _need_ to cancel any outstanding timers before we do
> -	 * call_rcu, even though no more timers can be armed.
> -	 *
> -	 * Moreover, we need to schedule work even if timer does not belong to
> -	 * the calling callback_fn, as on two different CPUs, we can end up in a
> -	 * situation where both sides run in parallel, try to cancel one
> -	 * another, and we end up waiting on both sides in hrtimer_cancel
> -	 * without making forward progress, since timer1 depends on time2
> -	 * callback to finish, and vice versa.
> -	 *
> -	 *  CPU 1 (timer1_cb)			CPU 2 (timer2_cb)
> -	 *  bpf_timer_cancel_and_free(timer2)	bpf_timer_cancel_and_free(timer1)
> -	 *
> -	 * To avoid these issues, punt to workqueue context when we are in a
> -	 * timer callback.
> +	 * No refcount_inc_not_zero(&cb->refcnt) here. Dropping the last
> +	 * refcnt. Either synchronously or asynchronously in irq_work.
>   	 */
> -	if (this_cpu_read(hrtimer_running)) {
> -		queue_work(system_dfl_wq, &t->cb.delete_work);
> -		return;
> -	}
>   
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> -		/* If the timer is running on other CPU, also use a kworker to
> -		 * wait for the completion of the timer instead of trying to
> -		 * acquire a sleepable lock in hrtimer_cancel() to wait for its
> -		 * completion.
> -		 */
> -		if (hrtimer_try_to_cancel(&t->timer) >= 0)
> -			call_rcu(&t->cb.rcu, bpf_async_cb_rcu_free);
> -		else
> -			queue_work(system_dfl_wq, &t->cb.delete_work);
> +	if (!in_hardirq()) {
> +		bpf_async_process_op(cb, BPF_ASYNC_CANCEL, 0, 0);
>   	} else {
> -		bpf_timer_delete_work(&t->cb.delete_work);
> +		(void)bpf_async_schedule_op(cb, BPF_ASYNC_CANCEL, 0, 0);
> +		/*
> +		 * bpf_async_schedule_op() either enqueues allocated cmd into llist
> +		 * or fails with ENOMEM and drop the last refcnt.
> +		 * This is unlikely, but safe, since bpf_async_cb_rcu_tasks_trace_free()
> +		 * callback will do additional timer/wq_cancel due to races anyway.
> +		 */
What if we simplify this further and remove cancellation here at all,
instead rely on last cancel in rcu callback? Is it just to run cancel
as early as possible to optimize the common case?
>   	}
>   }
>   
> @@ -1617,33 +1669,16 @@ static void bpf_timer_delete(struct bpf_hrtimer *t)
>    */
>   void bpf_timer_cancel_and_free(void *val)
>   {
> -	struct bpf_hrtimer *t;
> -
> -	t = (struct bpf_hrtimer *)__bpf_async_cancel_and_free(val);
> -	if (!t)
> -		return;
> -
> -	bpf_timer_delete(t);
> +	bpf_async_cancel_and_free(val);
>   }
>   
> -/* This function is called by map_delete/update_elem for individual element and
> +/*
> + * This function is called by map_delete/update_elem for individual element and
>    * by ops->map_release_uref when the user space reference to a map reaches zero.
>    */
>   void bpf_wq_cancel_and_free(void *val)
>   {
> -	struct bpf_work *work;
> -
> -	BTF_TYPE_EMIT(struct bpf_wq);
> -
> -	work = (struct bpf_work *)__bpf_async_cancel_and_free(val);
> -	if (!work)
> -		return;
> -	/* Trigger cancel of the sleepable work, but *do not* wait for
> -	 * it to finish if it was running as we might not be in a
> -	 * sleepable context.
> -	 * kfree will be called once the work has finished.
> -	 */
> -	schedule_work(&work->delete_work);
> +	bpf_async_cancel_and_free(val);
>   }
>   
>   BPF_CALL_2(bpf_kptr_xchg, void *, dst, void *, ptr)
> @@ -3116,16 +3151,23 @@ __bpf_kfunc int bpf_wq_start(struct bpf_wq *wq, unsigned int flags)
>   	struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
>   	struct bpf_work *w;
>   
> -	if (in_nmi())
> -		return -EOPNOTSUPP;
>   	if (flags)
>   		return -EINVAL;
> +
>   	w = READ_ONCE(async->work);
>   	if (!w || !READ_ONCE(w->cb.prog))
>   		return -EINVAL;
>   
> -	schedule_work(&w->work);
> -	return 0;
> +	if (!refcount_inc_not_zero(&w->cb.refcnt))
> +		return -ENOENT;
> +
> +	if (!in_hardirq()) {
> +		schedule_work(&w->work);
> +		bpf_async_refcount_put(&w->cb);
> +		return 0;
> +	} else {
> +		return bpf_async_schedule_op(&w->cb, BPF_ASYNC_START, 0, 0);
> +	}
>   }
>   
>   __bpf_kfunc int bpf_wq_set_callback(struct bpf_wq *wq,


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context
  2026-02-02 13:36   ` Mykyta Yatsenko
@ 2026-02-02 17:29     ` Alexei Starovoitov
  0 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2026-02-02 17:29 UTC (permalink / raw)
  To: Mykyta Yatsenko
  Cc: bpf, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Kernel Team

On Mon, Feb 2, 2026 at 5:36 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
> > +     switch (cb->type) {
> > +     case BPF_ASYNC_TYPE_TIMER:
> > +             if (hrtimer_try_to_cancel(&t->timer) < 0)
> > +                     retry = true;
> > +             break;
> > +     case BPF_ASYNC_TYPE_WQ:
> > +             if (!cancel_work(&w->work))
> > +                     retry = true;
> > +             break;
> > +     }
> > +     if (retry) {
> Isn't it the case that both timer and workqueue callbacks imply rcu locks?
> What scenario I'm not accounting for, thinking we can't get here?

The callbacks will call rcu_read_lock, but wq/timer infra won't enter
rcu CS before that. So callback may start executing after rcu task trace GP
and needs to be cancelled before we free sync_cb.

> > +     if (!in_hardirq()) {
> > +             bpf_async_process_op(cb, BPF_ASYNC_CANCEL, 0, 0);
> >       } else {
> > -             bpf_timer_delete_work(&t->cb.delete_work);
> > +             (void)bpf_async_schedule_op(cb, BPF_ASYNC_CANCEL, 0, 0);
> > +             /*
> > +              * bpf_async_schedule_op() either enqueues allocated cmd into llist
> > +              * or fails with ENOMEM and drop the last refcnt.
> > +              * This is unlikely, but safe, since bpf_async_cb_rcu_tasks_trace_free()
> > +              * callback will do additional timer/wq_cancel due to races anyway.
> > +              */
> What if we simplify this further and remove cancellation here at all,
> instead rely on last cancel in rcu callback? Is it just to run cancel
> as early as possible to optimize the common case?

Yes. callbacks scheduled in the future will be cancelled here
and most likely single rcu task trace GP will be enough.
If we don't cancel here there will be two task trace GPs which is
an unnecessary delay.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context
  2026-02-01  2:53 ` [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context Alexei Starovoitov
  2026-02-02 13:36   ` Mykyta Yatsenko
@ 2026-02-03 22:14   ` Kumar Kartikeya Dwivedi
  2026-02-03 23:53   ` Andrii Nakryiko
  2 siblings, 0 replies; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-02-03 22:14 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, daniel, andrii, martin.lau, mykyta.yatsenko5, kernel-team

On Sun, 1 Feb 2026 at 03:54, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Refactor bpf_timer and bpf_wq to allow calling them from any context:
> - add refcnt to bpf_async_cb
> - map_delete_elem or map_free will drop refcnt to zero
>   via bpf_async_cancel_and_free()
> - once refcnt is zero timer/wq_start is not allowed to make sure
>   that callback cannot rearm itself
> - if in_hardirq defer to start/cancel operations to irq_work
>
> Co-developed-by: Mykyta Yatsenko <yatsenko@meta.com>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

>  [...]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context
  2026-02-01  2:53 ` [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context Alexei Starovoitov
  2026-02-02 13:36   ` Mykyta Yatsenko
  2026-02-03 22:14   ` Kumar Kartikeya Dwivedi
@ 2026-02-03 23:53   ` Andrii Nakryiko
  2026-02-04  0:32     ` Alexei Starovoitov
  2 siblings, 1 reply; 21+ messages in thread
From: Andrii Nakryiko @ 2026-02-03 23:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, daniel, andrii, martin.lau, memxor, mykyta.yatsenko5,
	kernel-team

On Sat, Jan 31, 2026 at 6:54 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Refactor bpf_timer and bpf_wq to allow calling them from any context:
> - add refcnt to bpf_async_cb
> - map_delete_elem or map_free will drop refcnt to zero
>   via bpf_async_cancel_and_free()
> - once refcnt is zero timer/wq_start is not allowed to make sure
>   that callback cannot rearm itself
> - if in_hardirq defer to start/cancel operations to irq_work
>
> Co-developed-by: Mykyta Yatsenko <yatsenko@meta.com>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  kernel/bpf/helpers.c | 408 ++++++++++++++++++++++++-------------------
>  1 file changed, 225 insertions(+), 183 deletions(-)
>

[...]

> @@ -1456,10 +1493,20 @@ BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, timer, u64, nsecs, u64, fla
>         if (flags & BPF_F_TIMER_CPU_PIN)
>                 mode |= HRTIMER_MODE_PINNED;
>
> -       hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> -out:
> -       __bpf_spin_unlock_irqrestore(&timer->lock);
> -       return ret;
> +       /*
> +        * bpf_async_cancel_and_free() could have dropped refcnt to zero. In
> +        * such case BPF progs are not allowed to arm the timer to prevent UAF.
> +        */
> +       if (!refcount_inc_not_zero(&t->cb.refcnt))
> +               return -ENOENT;
> +
> +       if (!in_hardirq()) {
> +               hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> +               bpf_async_refcount_put(&t->cb);

any reason not to use bpf_async_process_op(BPF_ASYNC_START) here? you
are basically open-coding it here (same for bpf_wq_start). For
bpf_timer_delete() you just call
bpf_async_process_op(BPF_ASYNC_CANCEL), so it's inconsistent...

I can fix it up while applying, let me know


> +               return 0;
> +       } else {
> +               return bpf_async_schedule_op(&t->cb, BPF_ASYNC_START, nsecs, mode);
> +       }
>  }
>
>  static const struct bpf_func_proto bpf_timer_start_proto = {

[...]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context
  2026-02-03 23:53   ` Andrii Nakryiko
@ 2026-02-04  0:32     ` Alexei Starovoitov
  2026-02-04  0:53       ` Andrii Nakryiko
  0 siblings, 1 reply; 21+ messages in thread
From: Alexei Starovoitov @ 2026-02-04  0:32 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Kernel Team

On Tue, Feb 3, 2026 at 3:53 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Sat, Jan 31, 2026 at 6:54 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > Refactor bpf_timer and bpf_wq to allow calling them from any context:
> > - add refcnt to bpf_async_cb
> > - map_delete_elem or map_free will drop refcnt to zero
> >   via bpf_async_cancel_and_free()
> > - once refcnt is zero timer/wq_start is not allowed to make sure
> >   that callback cannot rearm itself
> > - if in_hardirq defer to start/cancel operations to irq_work
> >
> > Co-developed-by: Mykyta Yatsenko <yatsenko@meta.com>
> > Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >  kernel/bpf/helpers.c | 408 ++++++++++++++++++++++++-------------------
> >  1 file changed, 225 insertions(+), 183 deletions(-)
> >
>
> [...]
>
> > @@ -1456,10 +1493,20 @@ BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, timer, u64, nsecs, u64, fla
> >         if (flags & BPF_F_TIMER_CPU_PIN)
> >                 mode |= HRTIMER_MODE_PINNED;
> >
> > -       hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> > -out:
> > -       __bpf_spin_unlock_irqrestore(&timer->lock);
> > -       return ret;
> > +       /*
> > +        * bpf_async_cancel_and_free() could have dropped refcnt to zero. In
> > +        * such case BPF progs are not allowed to arm the timer to prevent UAF.
> > +        */
> > +       if (!refcount_inc_not_zero(&t->cb.refcnt))
> > +               return -ENOENT;
> > +
> > +       if (!in_hardirq()) {
> > +               hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> > +               bpf_async_refcount_put(&t->cb);
>
> any reason not to use bpf_async_process_op(BPF_ASYNC_START) here? you
> are basically open-coding it here (same for bpf_wq_start). For
> bpf_timer_delete() you just call
> bpf_async_process_op(BPF_ASYNC_CANCEL), so it's inconsistent...

I didn't want to obfuscate the call and to avoid tiny overhead
of switch (cb->type) which won't be optimized away by the compiler.

bpf_async_process_op(BPF_ASYNC_CANCEL) is there right next
to bpf_async_schedule_op(BPF_ASYNC_CANCEL),
so I wanted to keep those two consistent.

3rd reason is in bpf_timer_cancel_async() kfunc that
calls hrtimer_try_to_cancel() directly to return the result.
See patch 3.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context
  2026-02-04  0:32     ` Alexei Starovoitov
@ 2026-02-04  0:53       ` Andrii Nakryiko
  2026-02-04  0:56         ` Andrii Nakryiko
  0 siblings, 1 reply; 21+ messages in thread
From: Andrii Nakryiko @ 2026-02-04  0:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Kernel Team

On Tue, Feb 3, 2026 at 4:32 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Feb 3, 2026 at 3:53 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Sat, Jan 31, 2026 at 6:54 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > From: Alexei Starovoitov <ast@kernel.org>
> > >
> > > Refactor bpf_timer and bpf_wq to allow calling them from any context:
> > > - add refcnt to bpf_async_cb
> > > - map_delete_elem or map_free will drop refcnt to zero
> > >   via bpf_async_cancel_and_free()
> > > - once refcnt is zero timer/wq_start is not allowed to make sure
> > >   that callback cannot rearm itself
> > > - if in_hardirq defer to start/cancel operations to irq_work
> > >
> > > Co-developed-by: Mykyta Yatsenko <yatsenko@meta.com>
> > > Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > ---
> > >  kernel/bpf/helpers.c | 408 ++++++++++++++++++++++++-------------------
> > >  1 file changed, 225 insertions(+), 183 deletions(-)
> > >
> >
> > [...]
> >
> > > @@ -1456,10 +1493,20 @@ BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, timer, u64, nsecs, u64, fla
> > >         if (flags & BPF_F_TIMER_CPU_PIN)
> > >                 mode |= HRTIMER_MODE_PINNED;
> > >
> > > -       hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> > > -out:
> > > -       __bpf_spin_unlock_irqrestore(&timer->lock);
> > > -       return ret;
> > > +       /*
> > > +        * bpf_async_cancel_and_free() could have dropped refcnt to zero. In
> > > +        * such case BPF progs are not allowed to arm the timer to prevent UAF.
> > > +        */
> > > +       if (!refcount_inc_not_zero(&t->cb.refcnt))
> > > +               return -ENOENT;
> > > +
> > > +       if (!in_hardirq()) {
> > > +               hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> > > +               bpf_async_refcount_put(&t->cb);
> >
> > any reason not to use bpf_async_process_op(BPF_ASYNC_START) here? you
> > are basically open-coding it here (same for bpf_wq_start). For
> > bpf_timer_delete() you just call
> > bpf_async_process_op(BPF_ASYNC_CANCEL), so it's inconsistent...
>
> I didn't want to obfuscate the call and to avoid tiny overhead
> of switch (cb->type) which won't be optimized away by the compiler.
>
> bpf_async_process_op(BPF_ASYNC_CANCEL) is there right next
> to bpf_async_schedule_op(BPF_ASYNC_CANCEL),
> so I wanted to keep those two consistent.

Same as for BPF_ASYNC_START, not sure what's different there. I still
find it inconsistent with the synchronous bpf_timer_cancel(), but
whatever, I'll apply as is.

>
> 3rd reason is in bpf_timer_cancel_async() kfunc that
> calls hrtimer_try_to_cancel() directly to return the result.
> See patch 3.

yes, I noticed this one.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context
  2026-02-04  0:53       ` Andrii Nakryiko
@ 2026-02-04  0:56         ` Andrii Nakryiko
  0 siblings, 0 replies; 21+ messages in thread
From: Andrii Nakryiko @ 2026-02-04  0:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Mykyta Yatsenko, Kernel Team

On Tue, Feb 3, 2026 at 4:53 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Feb 3, 2026 at 4:32 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Feb 3, 2026 at 3:53 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Sat, Jan 31, 2026 at 6:54 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > From: Alexei Starovoitov <ast@kernel.org>
> > > >
> > > > Refactor bpf_timer and bpf_wq to allow calling them from any context:
> > > > - add refcnt to bpf_async_cb
> > > > - map_delete_elem or map_free will drop refcnt to zero
> > > >   via bpf_async_cancel_and_free()
> > > > - once refcnt is zero timer/wq_start is not allowed to make sure
> > > >   that callback cannot rearm itself
> > > > - if in_hardirq defer to start/cancel operations to irq_work
> > > >
> > > > Co-developed-by: Mykyta Yatsenko <yatsenko@meta.com>
> > > > Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > > ---
> > > >  kernel/bpf/helpers.c | 408 ++++++++++++++++++++++++-------------------
> > > >  1 file changed, 225 insertions(+), 183 deletions(-)
> > > >
> > >
> > > [...]
> > >
> > > > @@ -1456,10 +1493,20 @@ BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, timer, u64, nsecs, u64, fla
> > > >         if (flags & BPF_F_TIMER_CPU_PIN)
> > > >                 mode |= HRTIMER_MODE_PINNED;
> > > >
> > > > -       hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> > > > -out:
> > > > -       __bpf_spin_unlock_irqrestore(&timer->lock);
> > > > -       return ret;
> > > > +       /*
> > > > +        * bpf_async_cancel_and_free() could have dropped refcnt to zero. In
> > > > +        * such case BPF progs are not allowed to arm the timer to prevent UAF.
> > > > +        */
> > > > +       if (!refcount_inc_not_zero(&t->cb.refcnt))
> > > > +               return -ENOENT;
> > > > +
> > > > +       if (!in_hardirq()) {
> > > > +               hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> > > > +               bpf_async_refcount_put(&t->cb);
> > >
> > > any reason not to use bpf_async_process_op(BPF_ASYNC_START) here? you
> > > are basically open-coding it here (same for bpf_wq_start). For
> > > bpf_timer_delete() you just call
> > > bpf_async_process_op(BPF_ASYNC_CANCEL), so it's inconsistent...
> >
> > I didn't want to obfuscate the call and to avoid tiny overhead
> > of switch (cb->type) which won't be optimized away by the compiler.
> >
> > bpf_async_process_op(BPF_ASYNC_CANCEL) is there right next
> > to bpf_async_schedule_op(BPF_ASYNC_CANCEL),
> > so I wanted to keep those two consistent.
>
> Same as for BPF_ASYNC_START, not sure what's different there. I still
> find it inconsistent with the synchronous bpf_timer_cancel(), but
> whatever, I'll apply as is.

ah, wait, that BPF_ASYNC_CANCEL I was talking about that's actually in
cancel_and_free (which is asynchronous), I keep getting confused by
that large block of removed lines and bpf_timer_delete in between.
Never mind!

>
> >
> > 3rd reason is in bpf_timer_cancel_async() kfunc that
> > calls hrtimer_try_to_cancel() directly to return the result.
> > See patch 3.
>
> yes, I noticed this one.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq
  2026-02-01  2:53 [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq Alexei Starovoitov
                   ` (8 preceding siblings ...)
  2026-02-01  2:54 ` [PATCH v9 bpf-next 9/9] selftests/bpf: Add a test to stress bpf_timer_start and map_delete race Alexei Starovoitov
@ 2026-02-04  1:10 ` patchwork-bot+netdevbpf
  9 siblings, 0 replies; 21+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-02-04  1:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, daniel, andrii, martin.lau, memxor, mykyta.yatsenko5,
	kernel-team

Hello:

This series was applied to bpf/bpf-next.git (master)
by Andrii Nakryiko <andrii@kernel.org>:

On Sat, 31 Jan 2026 18:53:54 -0800 you wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> This series reworks implementation of BPF timer and workqueue APIs to
> make them usable from any context.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> 
> [...]

Here is the summary with links:
  - [v9,bpf-next,1/9] bpf: Enable bpf_timer and bpf_wq in any context
    https://git.kernel.org/bpf/bpf-next/c/1bfbc267ec91
  - [v9,bpf-next,2/9] bpf: Add verifier support for bpf_timer argument in kfuncs
    https://git.kernel.org/bpf/bpf-next/c/19bd300e22c2
  - [v9,bpf-next,3/9] bpf: Introduce bpf_timer_cancel_async() kfunc
    https://git.kernel.org/bpf/bpf-next/c/a7e172aa4ca2
  - [v9,bpf-next,4/9] selftests/bpf: Refactor timer selftests
    https://git.kernel.org/bpf/bpf-next/c/10653c0dd868
  - [v9,bpf-next,5/9] selftests/bpf: Add stress test for timer async cancel
    https://git.kernel.org/bpf/bpf-next/c/d02fdd7195ca
  - [v9,bpf-next,6/9] selftests/bpf: Verify bpf_timer_cancel_async works
    https://git.kernel.org/bpf/bpf-next/c/fe9d205cec8c
  - [v9,bpf-next,7/9] selftests/bpf: Add timer stress test in NMI context
    https://git.kernel.org/bpf/bpf-next/c/083c5a4babad
  - [v9,bpf-next,8/9] selftests/bpf: Removed obsolete tests
    https://git.kernel.org/bpf/bpf-next/c/3f7a8415209e
  - [v9,bpf-next,9/9] selftests/bpf: Add a test to stress bpf_timer_start and map_delete race
    https://git.kernel.org/bpf/bpf-next/c/b135beb07758

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2026-02-04  1:10 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-01  2:53 [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq Alexei Starovoitov
2026-02-01  2:53 ` [PATCH v9 bpf-next 1/9] bpf: Enable bpf_timer and bpf_wq in any context Alexei Starovoitov
2026-02-02 13:36   ` Mykyta Yatsenko
2026-02-02 17:29     ` Alexei Starovoitov
2026-02-03 22:14   ` Kumar Kartikeya Dwivedi
2026-02-03 23:53   ` Andrii Nakryiko
2026-02-04  0:32     ` Alexei Starovoitov
2026-02-04  0:53       ` Andrii Nakryiko
2026-02-04  0:56         ` Andrii Nakryiko
2026-02-01  2:53 ` [PATCH v9 bpf-next 2/9] bpf: Add verifier support for bpf_timer argument in kfuncs Alexei Starovoitov
2026-02-01  3:15   ` bot+bpf-ci
2026-02-01  2:53 ` [PATCH v9 bpf-next 3/9] bpf: Introduce bpf_timer_cancel_async() kfunc Alexei Starovoitov
2026-02-01  2:53 ` [PATCH v9 bpf-next 4/9] selftests/bpf: Refactor timer selftests Alexei Starovoitov
2026-02-01  2:53 ` [PATCH v9 bpf-next 5/9] selftests/bpf: Add stress test for timer async cancel Alexei Starovoitov
2026-02-01  2:54 ` [PATCH v9 bpf-next 6/9] selftests/bpf: Verify bpf_timer_cancel_async works Alexei Starovoitov
2026-02-01  2:54 ` [PATCH v9 bpf-next 7/9] selftests/bpf: Add timer stress test in NMI context Alexei Starovoitov
2026-02-01  2:54 ` [PATCH v9 bpf-next 8/9] selftests/bpf: Removed obsolete tests Alexei Starovoitov
2026-02-01  2:54 ` [PATCH v9 bpf-next 9/9] selftests/bpf: Add a test to stress bpf_timer_start and map_delete race Alexei Starovoitov
2026-02-01  3:15   ` bot+bpf-ci
2026-02-01  3:30     ` Alexei Starovoitov
2026-02-04  1:10 ` [PATCH v9 bpf-next 0/9] bpf: Avoid locks in bpf_timer and bpf_wq patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox