[PATCH v3 sched_ext/for-7.2 0/2] sched_ext: Add dispatch transaction API

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 sched_ext/for-7.2 0/2] sched_ext: Add dispatch transaction API
@ 2026-05-09 19:11 Cheng-Yang Chou
  2026-05-09 19:11 ` [PATCH v3 1/2] " Cheng-Yang Chou
  2026-05-09 19:11 ` [PATCH v3 2/2] selftests/sched_ext: Add dispatch_cookie test Cheng-Yang Chou
  0 siblings, 2 replies; 13+ messages in thread
From: Cheng-Yang Chou @ 2026-05-09 19:11 UTC (permalink / raw)
  To: sched-ext, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min
  Cc: Kuba Piecuch, Ching-Chun Huang, Chia-Ping Tsai, yphbchou0911

scx_bpf_dsq_insert() captures the task's sequence number at insert time,
so any pre-dispatch validity checks a BPF scheduler performs before the
insert fall outside the race detection window.

Patch 1: Introduce a dispatch transaction API with two new kfuncs.
- scx_bpf_dsq_insert_begin() captures the sequence number before any
  pre-dispatch checks, opening the transaction.
- scx_bpf_dsq_insert_commit() inserts the task using the early-captured
  token, extending the race detection window. If the token has gone stale
  (task was dequeued or claimed by another CPU), finish_dispatch()
  silently discards the entry. This is intended for schedulers that do
  not implement properly synchronized dequeue.

Patch 2: Add a selftest exercising the dispatch transaction API. The BPF
scheduler captures the token in ops.dispatch() and flips CPU affinity on
child tasks to trigger dequeue and re-enqueue, verifying that both
successful and stale-token dispatch paths are exercised.

Test plan:
- Applied this series and ran selftests with vng, all tests pass.
- Confirmed selftest skips on older kernel.

This series is discussed in [1].
[1]: https://lore.kernel.org/r/20260319083518.94673-1-arighi@nvidia.com/

Changes in v3:
- Rename cookie API to dispatch transaction API: rename kfuncs from
  scx_bpf_task_get_cookie/scx_bpf_dsq_insert_with_cookie to
  scx_bpf_dsq_insert_begin/scx_bpf_dsq_insert_commit (Tejun Heo)
- Rename internal scx_dsq_insert_commit() to scx_dsq_insert_buf() to
  avoid collision with the new public kfunc (Tejun Heo)
- Move __ksym __weak declarations in common.bpf.h into the API commit
- Redesign selftest to call insert from ops.dispatch() with CPU affinity
  flips to actually exercise the race detection window. v2 called from
  ops.enqueue() where the cookie was silently ignored (Tejun Heo)
- Skip selftest on older kernels lacking the dispatch transaction API
- Apply the same fix logic as commit e18002d2c791 to the selftest
- Link to v2:
  https://lore.kernel.org/r/20260506160412.522199-1-yphbchou0911@gmail.com/

Changes in v2:
- Reword kdoc for both kfuncs to avoid exposing qseq and ops_state as
  implementation details (Kuba Piecuch)
- Add __ksym __weak declarations for new kfuncs in common.bpf.h
- Add dispatch_cookie selftest (Kuba Piecuch)
- Link to v1:
  https://lore.kernel.org/r/20260506075925.371138-1-yphbchou0911@gmail.com/

Thanks,
Cheng-Yang

---

Cheng-Yang Chou (2):
  sched_ext: Add dispatch transaction API
  selftests/sched_ext: Add dispatch_cookie test

 kernel/sched/ext.c                            |  72 ++++++++++++-
 tools/sched_ext/include/scx/common.bpf.h      |   2 +
 tools/testing/selftests/sched_ext/Makefile    |   1 +
 .../selftests/sched_ext/dispatch_cookie.bpf.c | 102 ++++++++++++++++++
 .../selftests/sched_ext/dispatch_cookie.c     | 100 +++++++++++++++++
 5 files changed, 272 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/sched_ext/dispatch_cookie.bpf.c
 create mode 100644 tools/testing/selftests/sched_ext/dispatch_cookie.c

-- 
2.48.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/2] sched_ext: Add dispatch transaction API
  2026-05-09 19:11 [PATCH v3 sched_ext/for-7.2 0/2] sched_ext: Add dispatch transaction API Cheng-Yang Chou
@ 2026-05-09 19:11 ` Cheng-Yang Chou
  2026-05-09 19:49   ` sashiko-bot
                     ` (2 more replies)
  2026-05-09 19:11 ` [PATCH v3 2/2] selftests/sched_ext: Add dispatch_cookie test Cheng-Yang Chou
  1 sibling, 3 replies; 13+ messages in thread
From: Cheng-Yang Chou @ 2026-05-09 19:11 UTC (permalink / raw)
  To: sched-ext, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min
  Cc: Kuba Piecuch, Ching-Chun Huang, Chia-Ping Tsai, yphbchou0911

scx_bpf_dsq_insert() captures the task's dispatch token at insert time.
Any BPF-side validity checks performed before the insert fall outside
the race detection window: a dequeue/re-enqueue occurring between the
check and the insert goes undetected, and finish_dispatch() proceeds
with stale assumptions.

Introduce two new kfuncs to extend the detection window via a dispatch
transaction:

- scx_bpf_dsq_insert_begin(p)
    Starts a dispatch transaction for @p and returns an opaque u64
    token. The BPF scheduler should call this before performing
    pre-dispatch validity checks. The token may be stored in BPF maps
    to support cross-CPU dispatch patterns.

- scx_bpf_dsq_insert_commit(p, dsq_id, enq_flags, token)
    Like scx_bpf_dsq_insert() with slice=0, but commits using the
    token captured by scx_bpf_dsq_insert_begin(). If @p was dequeued
    or claimed between begin and commit, the transaction is silently
    discarded. Use scx_bpf_task_set_slice() to set a non-default slice.

To support explicit token passing, rename scx_dsq_insert_commit() to
scx_dsq_insert_buf() and add a qseq parameter. All existing callers
preserve the original behavior.

This mechanism is intended for schedulers that do not implement
properly synchronized dequeue. A scheduler whose ops.dequeue()
synchronizes atomically with the dispatch path does not need this API.

Suggested-by: Tejun Heo <tj@kernel.org>
Suggested-by: Kuba Piecuch <jpiecuch@google.com>
Suggested-by: Andrea Righi <arighi@nvidia.com>
Reported-by: Andrea Righi <arighi@nvidia.com>
Link: https://lore.kernel.org/r/20260203230639.1259869-1-arighi@nvidia.com/
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
---
 kernel/sched/ext.c                       | 72 ++++++++++++++++++++++--
 tools/sched_ext/include/scx/common.bpf.h |  2 +
 2 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index b2741b6fb046..81483520f5cc 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -8315,8 +8315,8 @@ static bool scx_dsq_insert_preamble(struct scx_sched *sch, struct task_struct *p
 	return true;
 }
 
-static void scx_dsq_insert_commit(struct scx_sched *sch, struct task_struct *p,
-				  u64 dsq_id, u64 enq_flags)
+static void scx_dsq_insert_buf(struct scx_sched *sch, struct task_struct *p,
+			       u64 dsq_id, u64 enq_flags, unsigned long qseq)
 {
 	struct scx_dsp_ctx *dspc = &this_cpu_ptr(sch->pcpu)->dsp_ctx;
 	struct task_struct *ddsp_task;
@@ -8334,7 +8334,7 @@ static void scx_dsq_insert_commit(struct scx_sched *sch, struct task_struct *p,
 
 	dspc->buf[dspc->cursor++] = (struct scx_dsp_buf_ent){
 		.task = p,
-		.qseq = atomic_long_read(&p->scx.ops_state) & SCX_OPSS_QSEQ_MASK,
+		.qseq = qseq,
 		.dsq_id = dsq_id,
 		.enq_flags = enq_flags,
 	};
@@ -8401,7 +8401,8 @@ __bpf_kfunc bool scx_bpf_dsq_insert___v2(struct task_struct *p, u64 dsq_id,
 	else
 		p->scx.slice = p->scx.slice ?: 1;
 
-	scx_dsq_insert_commit(sch, p, dsq_id, enq_flags);
+	scx_dsq_insert_buf(sch, p, dsq_id, enq_flags,
+			      atomic_long_read(&p->scx.ops_state) & SCX_OPSS_QSEQ_MASK);
 
 	return true;
 }
@@ -8429,7 +8430,8 @@ static bool scx_dsq_insert_vtime(struct scx_sched *sch, struct task_struct *p,
 
 	p->scx.dsq_vtime = vtime;
 
-	scx_dsq_insert_commit(sch, p, dsq_id, enq_flags | SCX_ENQ_DSQ_PRIQ);
+	scx_dsq_insert_buf(sch, p, dsq_id, enq_flags | SCX_ENQ_DSQ_PRIQ,
+			      atomic_long_read(&p->scx.ops_state) & SCX_OPSS_QSEQ_MASK);
 
 	return true;
 }
@@ -8518,13 +8520,72 @@ __bpf_kfunc void scx_bpf_dsq_insert_vtime(struct task_struct *p, u64 dsq_id,
 	scx_dsq_insert_vtime(sch, p, dsq_id, slice, vtime, enq_flags);
 }
 
+/**
+ * scx_bpf_dsq_insert_begin - Begin a dispatch transaction for a task
+ * @p: task_struct to dispatch
+ *
+ * Returns an opaque u64 dispatch token. Pass the token to
+ * scx_bpf_dsq_insert_commit() to insert @p into a DSQ. If @p is dequeued
+ * or claimed by another path between scx_bpf_dsq_insert_begin() and
+ * scx_bpf_dsq_insert_commit(), the commit will silently fail.
+ *
+ * This API is intended for schedulers that do not implement properly
+ * synchronized dequeue.
+ */
+__bpf_kfunc u64 scx_bpf_dsq_insert_begin(struct task_struct *p)
+{
+	return atomic_long_read(&p->scx.ops_state) & SCX_OPSS_QSEQ_MASK;
+}
+
+/**
+ * scx_bpf_dsq_insert_commit - Commit a dispatch transaction
+ * @p: task_struct to insert
+ * @dsq_id: DSQ to insert into
+ * @enq_flags: SCX_ENQ_*
+ * @token: token from scx_bpf_dsq_insert_begin()
+ * @aux: implicit BPF argument
+ *
+ * Like scx_bpf_dsq_insert() with slice=0, but commits a dispatch transaction
+ * begun with scx_bpf_dsq_insert_begin(). If @p was dequeued or claimed
+ * between begin and commit, the dispatch is silently discarded. Use
+ * scx_bpf_task_set_slice() to set a non-default slice.
+ *
+ * Returns %true if the entry was buffered for dispatch, %false on preamble
+ * failure (e.g. @p is not owned by this scheduler). Note: stale token
+ * detection fires asynchronously in finish_dispatch() after ops.dispatch()
+ * returns. A %true return does not guarantee the task was actually dispatched.
+ */
+__bpf_kfunc bool scx_bpf_dsq_insert_commit(struct task_struct *p,
+					    u64 dsq_id, u64 enq_flags,
+					    u64 token,
+					    const struct bpf_prog_aux *aux)
+{
+	struct scx_sched *sch;
+
+	guard(rcu)();
+	sch = scx_prog_sched(aux);
+	if (unlikely(!sch))
+		return false;
+
+	if (!scx_dsq_insert_preamble(sch, p, dsq_id, &enq_flags))
+		return false;
+
+	p->scx.slice = p->scx.slice ?: 1;
+
+	scx_dsq_insert_buf(sch, p, dsq_id, enq_flags, (unsigned long)token);
+
+	return true;
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(scx_kfunc_ids_enqueue_dispatch)
 BTF_ID_FLAGS(func, scx_bpf_dsq_insert, KF_IMPLICIT_ARGS | KF_RCU)
 BTF_ID_FLAGS(func, scx_bpf_dsq_insert___v2, KF_IMPLICIT_ARGS | KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_dsq_insert_commit, KF_IMPLICIT_ARGS | KF_RCU)
 BTF_ID_FLAGS(func, __scx_bpf_dsq_insert_vtime, KF_IMPLICIT_ARGS | KF_RCU)
 BTF_ID_FLAGS(func, scx_bpf_dsq_insert_vtime, KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_dsq_insert_begin, KF_RCU)
 BTF_KFUNCS_END(scx_kfunc_ids_enqueue_dispatch)
 
 static const struct btf_kfunc_id_set scx_kfunc_set_enqueue_dispatch = {
@@ -10194,6 +10255,7 @@ BTF_ID_FLAGS(func, scx_bpf_put_cpumask, KF_RELEASE)
 BTF_ID_FLAGS(func, scx_bpf_task_running, KF_RCU)
 BTF_ID_FLAGS(func, scx_bpf_task_cpu, KF_RCU)
 BTF_ID_FLAGS(func, scx_bpf_task_cid, KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_dsq_insert_begin, KF_RCU)
 BTF_ID_FLAGS(func, scx_bpf_cpu_rq, KF_IMPLICIT_ARGS)
 BTF_ID_FLAGS(func, scx_bpf_locked_rq, KF_IMPLICIT_ARGS | KF_RET_NULL)
 BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_IMPLICIT_ARGS | KF_RET_NULL | KF_RCU_PROTECTED)
diff --git a/tools/sched_ext/include/scx/common.bpf.h b/tools/sched_ext/include/scx/common.bpf.h
index 5f715d69cde6..fb793008e2e3 100644
--- a/tools/sched_ext/include/scx/common.bpf.h
+++ b/tools/sched_ext/include/scx/common.bpf.h
@@ -63,6 +63,8 @@ s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
 s32 __scx_bpf_select_cpu_and(struct task_struct *p, const struct cpumask *cpus_allowed,
 			     struct scx_bpf_select_cpu_and_args *args) __ksym __weak;
 bool __scx_bpf_dsq_insert_vtime(struct task_struct *p, struct scx_bpf_dsq_insert_vtime_args *args) __ksym __weak;
+u64 scx_bpf_dsq_insert_begin(struct task_struct *p) __ksym __weak;
+bool scx_bpf_dsq_insert_commit(struct task_struct *p, u64 dsq_id, u64 enq_flags, u64 token) __ksym __weak;
 u32 scx_bpf_dispatch_nr_slots(void) __ksym;
 void scx_bpf_dispatch_cancel(void) __ksym;
 void scx_bpf_kick_cpu(s32 cpu, u64 flags) __ksym;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] sched_ext: Add dispatch transaction API
  2026-05-09 19:11 ` [PATCH v3 1/2] " Cheng-Yang Chou
@ 2026-05-09 19:49   ` sashiko-bot
  2026-05-09 21:24   ` Andrea Righi
  2026-05-10 14:06   ` Tejun Heo
  2 siblings, 0 replies; 13+ messages in thread
From: sashiko-bot @ 2026-05-09 19:49 UTC (permalink / raw)
  To: Cheng-Yang Chou; +Cc: sched-ext

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] scx_bpf_dsq_insert_commit() writes to p->scx.slice before token validation, causing a data race if the task is running
- [Low] Duplicate kfunc ID registration for scx_bpf_dsq_insert_begin
--

commit 1b5ec039cdfb5b4d5077f7a5cfcc34f6f7393655
Author: Cheng-Yang Chou <yphbchou0911@gmail.com>

sched_ext: Add dispatch transaction API

This commit introduces scx_bpf_dsq_insert_begin() and
scx_bpf_dsq_insert_commit() to provide a dispatch transaction API.
It extends the race detection window for schedulers that perform BPF-side
validity checks without properly synchronized dequeue.

Link: https://lore.kernel.org/r/20260203230639.1259869-1-arighi@nvidia.com/

> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 1d7f206941f0c..05704a6a812a2 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c

[ ... ]

> @@ -8518,13 +8520,72 @@ __bpf_kfunc void scx_bpf_dsq_insert_vtime(struct task_struct *p, u64 dsq_id,
> +__bpf_kfunc bool scx_bpf_dsq_insert_commit(struct task_struct *p,
> +					    u64 dsq_id, u64 enq_flags,
> +					    u64 token,
> +					    const struct bpf_prog_aux *aux)
> +{
> +	struct scx_sched *sch;
> +
> +	guard(rcu)();
> +	sch = scx_prog_sched(aux);
> +	if (unlikely(!sch))
> +		return false;
> +
> +	if (!scx_dsq_insert_preamble(sch, p, dsq_id, &enq_flags))
> +		return false;
> +
> +	p->scx.slice = p->scx.slice ?: 1;

Since token validation happens asynchronously in finish_dispatch() after
ops.dispatch() returns, is it safe to write to p->scx.slice here?

If the transaction API is meant to handle cases where the task might be
claimed by another CPU between begin and commit, the task could be actively
running elsewhere when this function is called.

Could this lockless write race with the active CPU's timer tick executing
task_tick_scx() (which decrements the slice concurrently)?

It appears this could result in a read-modify-write that clobbers the
concurrent decrement and corrupts the task's slice accounting.

[ ... ]

>  BTF_KFUNCS_START(scx_kfunc_ids_enqueue_dispatch)
>  BTF_ID_FLAGS(func, scx_bpf_dsq_insert, KF_IMPLICIT_ARGS | KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_dsq_insert___v2, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_dsq_insert_commit, KF_IMPLICIT_ARGS | KF_RCU)
>  BTF_ID_FLAGS(func, __scx_bpf_dsq_insert_vtime, KF_IMPLICIT_ARGS | KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_dsq_insert_vtime, KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_dsq_insert_begin, KF_RCU)
>  BTF_KFUNCS_END(scx_kfunc_ids_enqueue_dispatch)

[ ... ]

> @@ -10194,6 +10255,7 @@ BTF_ID_FLAGS(func, scx_bpf_put_cpumask, KF_RELEASE)
>  BTF_ID_FLAGS(func, scx_bpf_task_running, KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_task_cpu, KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_task_cid, KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_dsq_insert_begin, KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_cpu_rq, KF_IMPLICIT_ARGS)

This isn't a bug, but is it intentional to register scx_bpf_dsq_insert_begin
in both kfunc ID sets?

Because scx_kfunc_context_filter() permits a kfunc if it is present in either
set, and both sets apply to the scheduler context, the duplicate registration
appears functionally redundant.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260509191223.168648-1-yphbchou0911@gmail.com?part=1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] sched_ext: Add dispatch transaction API
  2026-05-09 19:11 ` [PATCH v3 1/2] " Cheng-Yang Chou
  2026-05-09 19:49   ` sashiko-bot
@ 2026-05-09 21:24   ` Andrea Righi
  2026-05-10 14:02     ` Tejun Heo
  2026-05-10 14:06   ` Tejun Heo
  2 siblings, 1 reply; 13+ messages in thread
From: Andrea Righi @ 2026-05-09 21:24 UTC (permalink / raw)
  To: Cheng-Yang Chou
  Cc: sched-ext, Tejun Heo, David Vernet, Changwoo Min, Kuba Piecuch,
	Ching-Chun Huang, Chia-Ping Tsai

Hi Cheng-Yang,

On Sun, May 10, 2026 at 03:11:56AM +0800, Cheng-Yang Chou wrote:
> scx_bpf_dsq_insert() captures the task's dispatch token at insert time.
> Any BPF-side validity checks performed before the insert fall outside
> the race detection window: a dequeue/re-enqueue occurring between the
> check and the insert goes undetected, and finish_dispatch() proceeds
> with stale assumptions.
> 
> Introduce two new kfuncs to extend the detection window via a dispatch
> transaction:
> 
> - scx_bpf_dsq_insert_begin(p)
>     Starts a dispatch transaction for @p and returns an opaque u64
>     token. The BPF scheduler should call this before performing
>     pre-dispatch validity checks. The token may be stored in BPF maps
>     to support cross-CPU dispatch patterns.
> 
> - scx_bpf_dsq_insert_commit(p, dsq_id, enq_flags, token)
>     Like scx_bpf_dsq_insert() with slice=0, but commits using the
>     token captured by scx_bpf_dsq_insert_begin(). If @p was dequeued
>     or claimed between begin and commit, the transaction is silently
>     discarded. Use scx_bpf_task_set_slice() to set a non-default slice.

Why not passing slice to scx_bpf_dsq_insert_commit()? Is it because of the BPF
args limitation? In that case we could introduce a struct similar to
scx_bpf_dsq_insert_vtime_args.

Speaking of vtime, we may also need a scx_bpf_dsq_insert_vtime_commit() that
accepts dsq_vtime as well, otherwise how do we use priority DSQs with this new
transaction variant?

> 
> To support explicit token passing, rename scx_dsq_insert_commit() to
> scx_dsq_insert_buf() and add a qseq parameter. All existing callers
> preserve the original behavior.
> 
> This mechanism is intended for schedulers that do not implement
> properly synchronized dequeue. A scheduler whose ops.dequeue()
> synchronizes atomically with the dispatch path does not need this API.

I'm wondering if we should validate qseq also in mark_direct_dispatch(), it
seems that we're not validating qseq in the direct dispatch path, or am I
missing something?

Thanks,
-Andrea

> 
> Suggested-by: Tejun Heo <tj@kernel.org>
> Suggested-by: Kuba Piecuch <jpiecuch@google.com>
> Suggested-by: Andrea Righi <arighi@nvidia.com>
> Reported-by: Andrea Righi <arighi@nvidia.com>
> Link: https://lore.kernel.org/r/20260203230639.1259869-1-arighi@nvidia.com/
> Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
> ---
>  kernel/sched/ext.c                       | 72 ++++++++++++++++++++++--
>  tools/sched_ext/include/scx/common.bpf.h |  2 +
>  2 files changed, 69 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index b2741b6fb046..81483520f5cc 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -8315,8 +8315,8 @@ static bool scx_dsq_insert_preamble(struct scx_sched *sch, struct task_struct *p
>  	return true;
>  }
>  
> -static void scx_dsq_insert_commit(struct scx_sched *sch, struct task_struct *p,
> -				  u64 dsq_id, u64 enq_flags)
> +static void scx_dsq_insert_buf(struct scx_sched *sch, struct task_struct *p,
> +			       u64 dsq_id, u64 enq_flags, unsigned long qseq)
>  {
>  	struct scx_dsp_ctx *dspc = &this_cpu_ptr(sch->pcpu)->dsp_ctx;
>  	struct task_struct *ddsp_task;
> @@ -8334,7 +8334,7 @@ static void scx_dsq_insert_commit(struct scx_sched *sch, struct task_struct *p,
>  
>  	dspc->buf[dspc->cursor++] = (struct scx_dsp_buf_ent){
>  		.task = p,
> -		.qseq = atomic_long_read(&p->scx.ops_state) & SCX_OPSS_QSEQ_MASK,
> +		.qseq = qseq,
>  		.dsq_id = dsq_id,
>  		.enq_flags = enq_flags,
>  	};
> @@ -8401,7 +8401,8 @@ __bpf_kfunc bool scx_bpf_dsq_insert___v2(struct task_struct *p, u64 dsq_id,
>  	else
>  		p->scx.slice = p->scx.slice ?: 1;
>  
> -	scx_dsq_insert_commit(sch, p, dsq_id, enq_flags);
> +	scx_dsq_insert_buf(sch, p, dsq_id, enq_flags,
> +			      atomic_long_read(&p->scx.ops_state) & SCX_OPSS_QSEQ_MASK);
>  
>  	return true;
>  }
> @@ -8429,7 +8430,8 @@ static bool scx_dsq_insert_vtime(struct scx_sched *sch, struct task_struct *p,
>  
>  	p->scx.dsq_vtime = vtime;
>  
> -	scx_dsq_insert_commit(sch, p, dsq_id, enq_flags | SCX_ENQ_DSQ_PRIQ);
> +	scx_dsq_insert_buf(sch, p, dsq_id, enq_flags | SCX_ENQ_DSQ_PRIQ,
> +			      atomic_long_read(&p->scx.ops_state) & SCX_OPSS_QSEQ_MASK);
>  
>  	return true;
>  }
> @@ -8518,13 +8520,72 @@ __bpf_kfunc void scx_bpf_dsq_insert_vtime(struct task_struct *p, u64 dsq_id,
>  	scx_dsq_insert_vtime(sch, p, dsq_id, slice, vtime, enq_flags);
>  }
>  
> +/**
> + * scx_bpf_dsq_insert_begin - Begin a dispatch transaction for a task
> + * @p: task_struct to dispatch
> + *
> + * Returns an opaque u64 dispatch token. Pass the token to
> + * scx_bpf_dsq_insert_commit() to insert @p into a DSQ. If @p is dequeued
> + * or claimed by another path between scx_bpf_dsq_insert_begin() and
> + * scx_bpf_dsq_insert_commit(), the commit will silently fail.
> + *
> + * This API is intended for schedulers that do not implement properly
> + * synchronized dequeue.
> + */
> +__bpf_kfunc u64 scx_bpf_dsq_insert_begin(struct task_struct *p)
> +{
> +	return atomic_long_read(&p->scx.ops_state) & SCX_OPSS_QSEQ_MASK;
> +}
> +
> +/**
> + * scx_bpf_dsq_insert_commit - Commit a dispatch transaction
> + * @p: task_struct to insert
> + * @dsq_id: DSQ to insert into
> + * @enq_flags: SCX_ENQ_*
> + * @token: token from scx_bpf_dsq_insert_begin()
> + * @aux: implicit BPF argument
> + *
> + * Like scx_bpf_dsq_insert() with slice=0, but commits a dispatch transaction
> + * begun with scx_bpf_dsq_insert_begin(). If @p was dequeued or claimed
> + * between begin and commit, the dispatch is silently discarded. Use
> + * scx_bpf_task_set_slice() to set a non-default slice.
> + *
> + * Returns %true if the entry was buffered for dispatch, %false on preamble
> + * failure (e.g. @p is not owned by this scheduler). Note: stale token
> + * detection fires asynchronously in finish_dispatch() after ops.dispatch()
> + * returns. A %true return does not guarantee the task was actually dispatched.
> + */
> +__bpf_kfunc bool scx_bpf_dsq_insert_commit(struct task_struct *p,
> +					    u64 dsq_id, u64 enq_flags,
> +					    u64 token,
> +					    const struct bpf_prog_aux *aux)
> +{
> +	struct scx_sched *sch;
> +
> +	guard(rcu)();
> +	sch = scx_prog_sched(aux);
> +	if (unlikely(!sch))
> +		return false;
> +
> +	if (!scx_dsq_insert_preamble(sch, p, dsq_id, &enq_flags))
> +		return false;
> +
> +	p->scx.slice = p->scx.slice ?: 1;
> +
> +	scx_dsq_insert_buf(sch, p, dsq_id, enq_flags, (unsigned long)token);
> +
> +	return true;
> +}
> +
>  __bpf_kfunc_end_defs();
>  
>  BTF_KFUNCS_START(scx_kfunc_ids_enqueue_dispatch)
>  BTF_ID_FLAGS(func, scx_bpf_dsq_insert, KF_IMPLICIT_ARGS | KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_dsq_insert___v2, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_dsq_insert_commit, KF_IMPLICIT_ARGS | KF_RCU)
>  BTF_ID_FLAGS(func, __scx_bpf_dsq_insert_vtime, KF_IMPLICIT_ARGS | KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_dsq_insert_vtime, KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_dsq_insert_begin, KF_RCU)
>  BTF_KFUNCS_END(scx_kfunc_ids_enqueue_dispatch)
>  
>  static const struct btf_kfunc_id_set scx_kfunc_set_enqueue_dispatch = {
> @@ -10194,6 +10255,7 @@ BTF_ID_FLAGS(func, scx_bpf_put_cpumask, KF_RELEASE)
>  BTF_ID_FLAGS(func, scx_bpf_task_running, KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_task_cpu, KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_task_cid, KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_dsq_insert_begin, KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_cpu_rq, KF_IMPLICIT_ARGS)
>  BTF_ID_FLAGS(func, scx_bpf_locked_rq, KF_IMPLICIT_ARGS | KF_RET_NULL)
>  BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_IMPLICIT_ARGS | KF_RET_NULL | KF_RCU_PROTECTED)
> diff --git a/tools/sched_ext/include/scx/common.bpf.h b/tools/sched_ext/include/scx/common.bpf.h
> index 5f715d69cde6..fb793008e2e3 100644
> --- a/tools/sched_ext/include/scx/common.bpf.h
> +++ b/tools/sched_ext/include/scx/common.bpf.h
> @@ -63,6 +63,8 @@ s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
>  s32 __scx_bpf_select_cpu_and(struct task_struct *p, const struct cpumask *cpus_allowed,
>  			     struct scx_bpf_select_cpu_and_args *args) __ksym __weak;
>  bool __scx_bpf_dsq_insert_vtime(struct task_struct *p, struct scx_bpf_dsq_insert_vtime_args *args) __ksym __weak;
> +u64 scx_bpf_dsq_insert_begin(struct task_struct *p) __ksym __weak;
> +bool scx_bpf_dsq_insert_commit(struct task_struct *p, u64 dsq_id, u64 enq_flags, u64 token) __ksym __weak;
>  u32 scx_bpf_dispatch_nr_slots(void) __ksym;
>  void scx_bpf_dispatch_cancel(void) __ksym;
>  void scx_bpf_kick_cpu(s32 cpu, u64 flags) __ksym;
> -- 
> 2.48.1
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] sched_ext: Add dispatch transaction API
  2026-05-09 21:24   ` Andrea Righi
@ 2026-05-10 14:02     ` Tejun Heo
  2026-05-10 14:58       ` Andrea Righi
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2026-05-10 14:02 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Cheng-Yang Chou, sched-ext, David Vernet, Changwoo Min,
	Kuba Piecuch, Ching-Chun Huang, Chia-Ping Tsai

Hello,

On Sat, May 09, 2026 at 11:24:12PM +0200, Andrea Righi wrote:
> Why not passing slice to scx_bpf_dsq_insert_commit()? Is it because of the BPF
> args limitation? In that case we could introduce a struct similar to
> scx_bpf_dsq_insert_vtime_args.
> 
> Speaking of vtime, we may also need a scx_bpf_dsq_insert_vtime_commit() that
> accepts dsq_vtime as well, otherwise how do we use priority DSQs with this new
> transaction variant?

Given that these are only meaningful when targeting local DSQs, I don't
think vtime interface is necessary. It'd probably be a good idea to
explicitly restrict usage to local DSQs.

> > To support explicit token passing, rename scx_dsq_insert_commit() to
> > scx_dsq_insert_buf() and add a qseq parameter. All existing callers
> > preserve the original behavior.
> > 
> > This mechanism is intended for schedulers that do not implement
> > properly synchronized dequeue. A scheduler whose ops.dequeue()
> > synchronizes atomically with the dispatch path does not need this API.
> 
> I'm wondering if we should validate qseq also in mark_direct_dispatch(), it
> seems that we're not validating qseq in the direct dispatch path, or am I
> missing something?

In DD path, the task can't be lost and it'd be a bit silly to use this
interface. Again, probably good idea to at least start with just allowing in
dispatch path.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] sched_ext: Add dispatch transaction API
  2026-05-10 14:02     ` Tejun Heo
@ 2026-05-10 14:58       ` Andrea Righi
  2026-05-15 14:50         ` Cheng-Yang Chou
  0 siblings, 1 reply; 13+ messages in thread
From: Andrea Righi @ 2026-05-10 14:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Cheng-Yang Chou, sched-ext, David Vernet, Changwoo Min,
	Kuba Piecuch, Ching-Chun Huang, Chia-Ping Tsai

On Sun, May 10, 2026 at 04:02:33AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Sat, May 09, 2026 at 11:24:12PM +0200, Andrea Righi wrote:
> > Why not passing slice to scx_bpf_dsq_insert_commit()? Is it because of the BPF
> > args limitation? In that case we could introduce a struct similar to
> > scx_bpf_dsq_insert_vtime_args.
> > 
> > Speaking of vtime, we may also need a scx_bpf_dsq_insert_vtime_commit() that
> > accepts dsq_vtime as well, otherwise how do we use priority DSQs with this new
> > transaction variant?
> 
> Given that these are only meaningful when targeting local DSQs, I don't
> think vtime interface is necessary. It'd probably be a good idea to
> explicitly restrict usage to local DSQs.

Ack, it definitely makes sense to restrict this to local DSQs.

> 
> > > To support explicit token passing, rename scx_dsq_insert_commit() to
> > > scx_dsq_insert_buf() and add a qseq parameter. All existing callers
> > > preserve the original behavior.
> > > 
> > > This mechanism is intended for schedulers that do not implement
> > > properly synchronized dequeue. A scheduler whose ops.dequeue()
> > > synchronizes atomically with the dispatch path does not need this API.
> > 
> > I'm wondering if we should validate qseq also in mark_direct_dispatch(), it
> > seems that we're not validating qseq in the direct dispatch path, or am I
> > missing something?
> 
> In DD path, the task can't be lost and it'd be a bit silly to use this
> interface. Again, probably good idea to at least start with just allowing in
> dispatch path.

Also true. Thanks for clarifying.

-Andrea

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] sched_ext: Add dispatch transaction API
  2026-05-10 14:58       ` Andrea Righi
@ 2026-05-15 14:50         ` Cheng-Yang Chou
  0 siblings, 0 replies; 13+ messages in thread
From: Cheng-Yang Chou @ 2026-05-15 14:50 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Tejun Heo, sched-ext, David Vernet, Changwoo Min, Kuba Piecuch,
	Ching-Chun Huang, Chia-Ping Tsai

Hi Andrea, Tejun,

On Sun, May 10, 2026 at 04:58:13PM +0200, Andrea Righi wrote:
> On Sun, May 10, 2026 at 04:02:33AM -1000, Tejun Heo wrote:
> > Hello,
> > 
> > On Sat, May 09, 2026 at 11:24:12PM +0200, Andrea Righi wrote:
> > > Why not passing slice to scx_bpf_dsq_insert_commit()? Is it because of the BPF
> > > args limitation? In that case we could introduce a struct similar to
> > > scx_bpf_dsq_insert_vtime_args.

Introduced scx_bpf_dsq_insert_commit_args wrapping dsq_id, slice and
enq_flags to stay within the BPF verifier's five-arg limit.

> > > 
> > > Speaking of vtime, we may also need a scx_bpf_dsq_insert_vtime_commit() that
> > > accepts dsq_vtime as well, otherwise how do we use priority DSQs with this new
> > > transaction variant?
> > 
> > Given that these are only meaningful when targeting local DSQs, I don't
> > think vtime interface is necessary. It'd probably be a good idea to
> > explicitly restrict usage to local DSQs.
> 
> Ack, it definitely makes sense to restrict this to local DSQs.

Restricted scx_bpf_dsq_insert_commit() to local DSQs at runtime,
targeting a non-local DSQ aborts the sched via scx_error()

> 
> > 
> > > > To support explicit token passing, rename scx_dsq_insert_commit() to
> > > > scx_dsq_insert_buf() and add a qseq parameter. All existing callers
> > > > preserve the original behavior.
> > > > 
> > > > This mechanism is intended for schedulers that do not implement
> > > > properly synchronized dequeue. A scheduler whose ops.dequeue()
> > > > synchronizes atomically with the dispatch path does not need this API.
> > > 
> > > I'm wondering if we should validate qseq also in mark_direct_dispatch(), it
> > > seems that we're not validating qseq in the direct dispatch path, or am I
> > > missing something?
> > 
> > In DD path, the task can't be lost and it'd be a bit silly to use this
> > interface. Again, probably good idea to at least start with just allowing in
> > dispatch path.

Restricted scx_bpf_dsq_insert_commit() to ops.dispatch() only by moviing
it from scx_kfunc_ids_enqueue_dispatch to scx_kfunc_ids_dispatch.

> 
> Also true. Thanks for clarifying.

And, added a comment to mark_direct_dispatch() explaining why qseq
validation is not needed.

Thanks!

-- 
Cheers,
Cheng-Yang

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] sched_ext: Add dispatch transaction API
  2026-05-09 19:11 ` [PATCH v3 1/2] " Cheng-Yang Chou
  2026-05-09 19:49   ` sashiko-bot
  2026-05-09 21:24   ` Andrea Righi
@ 2026-05-10 14:06   ` Tejun Heo
  2026-05-15 14:36     ` Cheng-Yang Chou
  2 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2026-05-10 14:06 UTC (permalink / raw)
  To: Cheng-Yang Chou
  Cc: sched-ext, David Vernet, Andrea Righi, Changwoo Min, Kuba Piecuch,
	Ching-Chun Huang, Chia-Ping Tsai

On Sun, May 10, 2026 at 03:11:56AM +0800, Cheng-Yang Chou wrote:
> -static void scx_dsq_insert_commit(struct scx_sched *sch, struct task_struct *p,
> -				  u64 dsq_id, u64 enq_flags)
> +static void scx_dsq_insert_buf(struct scx_sched *sch, struct task_struct *p,
> +			       u64 dsq_id, u64 enq_flags, unsigned long qseq)

Please don't mix rename and functional changes in the same patch. Can we do
scx_dsq_insert_stage() instead?

> +/**
> + * scx_bpf_dsq_insert_commit - Commit a dispatch transaction
> + * @p: task_struct to insert
> + * @dsq_id: DSQ to insert into
> + * @enq_flags: SCX_ENQ_*
> + * @token: token from scx_bpf_dsq_insert_begin()
> + * @aux: implicit BPF argument
> + *
> + * Like scx_bpf_dsq_insert() with slice=0, but commits a dispatch transaction
> + * begun with scx_bpf_dsq_insert_begin(). If @p was dequeued or claimed
> + * between begin and commit, the dispatch is silently discarded. Use
> + * scx_bpf_task_set_slice() to set a non-default slice.
> + *
> + * Returns %true if the entry was buffered for dispatch, %false on preamble
> + * failure (e.g. @p is not owned by this scheduler). Note: stale token
> + * detection fires asynchronously in finish_dispatch() after ops.dispatch()
> + * returns. A %true return does not guarantee the task was actually dispatched.
> + */

I think this needs more detailed documentation. It's difficult to tell what
it provides and why one would want to use it from the above and it's a
pretty subtle thing.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] sched_ext: Add dispatch transaction API
  2026-05-10 14:06   ` Tejun Heo
@ 2026-05-15 14:36     ` Cheng-Yang Chou
  0 siblings, 0 replies; 13+ messages in thread
From: Cheng-Yang Chou @ 2026-05-15 14:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: sched-ext, David Vernet, Andrea Righi, Changwoo Min, Kuba Piecuch,
	Ching-Chun Huang, Chia-Ping Tsai

Hi Tejun,

On Sun, May 10, 2026 at 04:06:19AM -1000, Tejun Heo wrote:
> On Sun, May 10, 2026 at 03:11:56AM +0800, Cheng-Yang Chou wrote:
> > -static void scx_dsq_insert_commit(struct scx_sched *sch, struct task_struct *p,
> > -				  u64 dsq_id, u64 enq_flags)
> > +static void scx_dsq_insert_buf(struct scx_sched *sch, struct task_struct *p,
> > +			       u64 dsq_id, u64 enq_flags, unsigned long qseq)
> 
> Please don't mix rename and functional changes in the same patch. Can we do
> scx_dsq_insert_stage() instead?

Seperated into its own commit using scx_diq_insert_stage() in v4.

> 
> > +/**
> > + * scx_bpf_dsq_insert_commit - Commit a dispatch transaction
> > + * @p: task_struct to insert
> > + * @dsq_id: DSQ to insert into
> > + * @enq_flags: SCX_ENQ_*
> > + * @token: token from scx_bpf_dsq_insert_begin()
> > + * @aux: implicit BPF argument
> > + *
> > + * Like scx_bpf_dsq_insert() with slice=0, but commits a dispatch transaction
> > + * begun with scx_bpf_dsq_insert_begin(). If @p was dequeued or claimed
> > + * between begin and commit, the dispatch is silently discarded. Use
> > + * scx_bpf_task_set_slice() to set a non-default slice.
> > + *
> > + * Returns %true if the entry was buffered for dispatch, %false on preamble
> > + * failure (e.g. @p is not owned by this scheduler). Note: stale token
> > + * detection fires asynchronously in finish_dispatch() after ops.dispatch()
> > + * returns. A %true return does not guarantee the task was actually dispatched.
> > + */
> 
> I think this needs more detailed documentation. It's difficult to tell what
> it provides and why one would want to use it from the above and it's a
> pretty subtle thing.

Expanded in v4, the kdoc now covers the race scenario where scx_bpf_dsq_insert()
misses pre-dispatch validity checks, the intended usage with ops.dequeue(),
and the caveat that a %true return doesn't guarantee actual dispatch.

Thanks.

-- 
Cheers,
Cheng-Yang

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 2/2] selftests/sched_ext: Add dispatch_cookie test
  2026-05-09 19:11 [PATCH v3 sched_ext/for-7.2 0/2] sched_ext: Add dispatch transaction API Cheng-Yang Chou
  2026-05-09 19:11 ` [PATCH v3 1/2] " Cheng-Yang Chou
@ 2026-05-09 19:11 ` Cheng-Yang Chou
  2026-05-09 20:16   ` sashiko-bot
  2026-05-09 21:43   ` Andrea Righi
  1 sibling, 2 replies; 13+ messages in thread
From: Cheng-Yang Chou @ 2026-05-09 19:11 UTC (permalink / raw)
  To: sched-ext, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min
  Cc: Kuba Piecuch, Ching-Chun Huang, Chia-Ping Tsai, yphbchou0911

Test scx_bpf_dsq_insert_begin() and scx_bpf_dsq_insert_commit().

The BPF scheduler enqueues tasks into a BPF queue map in ops.enqueue()
and dispatches them via the begin/commit transaction API in ops.dispatch().
After a successful dispatch, the token is stored. On the task's next
dispatch (after dequeue/re-enqueue increments qseq), the stored token is
stale and finish_dispatch() silently drops the buffered entry.

Userspace forks spinning children and repeatedly flips their CPU affinity
to trigger dequeue/re-enqueue cycles. The test verifies both nr_tx_dispatched
and nr_tx_stale are positive, exercising both the happy path and the
race-detection window.

Suggested-by: Tejun Heo <tj@kernel.org>
Suggested-by: Kuba Piecuch <jpiecuch@google.com>
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
---
 tools/testing/selftests/sched_ext/Makefile    |   1 +
 .../selftests/sched_ext/dispatch_cookie.bpf.c | 102 ++++++++++++++
 .../selftests/sched_ext/dispatch_cookie.c     | 131 ++++++++++++++++++
 3 files changed, 234 insertions(+)
 create mode 100644 tools/testing/selftests/sched_ext/dispatch_cookie.bpf.c
 create mode 100644 tools/testing/selftests/sched_ext/dispatch_cookie.c

diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile
index 5d2dffca0e91..ae3dc0913378 100644
--- a/tools/testing/selftests/sched_ext/Makefile
+++ b/tools/testing/selftests/sched_ext/Makefile
@@ -164,6 +164,7 @@ all_test_bpfprogs := $(foreach prog,$(wildcard *.bpf.c),$(INCLUDE_DIR)/$(patsubs
 auto-test-targets :=			\
 	create_dsq			\
 	dequeue				\
+	dispatch_cookie			\
 	enq_last_no_enq_fails		\
 	ddsp_bogus_dsq_fail		\
 	ddsp_vtimelocal_fail		\
diff --git a/tools/testing/selftests/sched_ext/dispatch_cookie.bpf.c b/tools/testing/selftests/sched_ext/dispatch_cookie.bpf.c
new file mode 100644
index 000000000000..ed31cddcda7b
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/dispatch_cookie.bpf.c
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Test scx_bpf_dsq_insert_begin() and scx_bpf_dsq_insert_commit().
+ *
+ * Exercises both the happy path (fresh token committed successfully) and
+ * the race-detection path (stale token from a previous dispatch cycle
+ * rejected after the task was dequeued and re-enqueued, incrementing qseq).
+ *
+ * Copyright (C) 2026 Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
+ * Copyright (C) 2026 Cheng-Yang Chou <yphbchou0911@gmail.com>
+ */
+#include <scx/common.bpf.h>
+
+char _license[] SEC("license") = "GPL";
+
+UEI_DEFINE(uei);
+
+struct {
+	__uint(type, BPF_MAP_TYPE_QUEUE);
+	__uint(max_entries, 8192);
+	__type(value, s32);
+} queue SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 4096);
+	__type(key, s32);
+	__type(value, u64);
+} last_token SEC(".maps");
+
+long nr_tx_dispatched;
+long nr_tx_stale;
+
+void BPF_STRUCT_OPS(dispatch_cookie_enqueue, struct task_struct *p,
+		    u64 enq_flags)
+{
+	s32 pid = p->pid;
+
+	if (bpf_map_push_elem(&queue, &pid, 0))
+		scx_bpf_error("Failed to enqueue %s[%d]", p->comm, p->pid);
+}
+
+void BPF_STRUCT_OPS(dispatch_cookie_dispatch, s32 cpu,
+		    struct task_struct *prev)
+{
+	s32 pid;
+	struct task_struct *p;
+	u64 *stored, token;
+
+	if (bpf_map_pop_elem(&queue, &pid))
+		return;
+
+	p = bpf_task_from_pid(pid);
+	if (!p)
+		return;
+
+	/*
+	 * After a successful fresh dispatch, store the token. On the task's
+	 * next dispatch (after re-enqueue increments qseq), the stored token
+	 * exercises the race-detection window in finish_dispatch().
+	 *
+	 * scx_bpf_dsq_insert_commit() always returns %true when the preamble
+	 * passes. Stale detection fires asynchronously in finish_dispatch()
+	 * with no BPF-observable signal. When using a stored token, always
+	 * pair the commit() call with a fallback scx_bpf_dsq_insert(): if
+	 * the token is stale, finish_dispatch() drops the first entry and
+	 * the fallback dispatches the task. If the token is still fresh,
+	 * finish_dispatch() dispatches it and the fallback's CAS is a no-op.
+	 */
+	stored = bpf_map_lookup_elem(&last_token, &pid);
+	if (stored) {
+		token = *stored;
+		bpf_map_delete_elem(&last_token, &pid);
+		scx_bpf_dsq_insert_commit(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, token);
+		scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, 0);
+		__sync_fetch_and_add(&nr_tx_stale, 1);
+	} else {
+		token = scx_bpf_dsq_insert_begin(p);
+		if (scx_bpf_dsq_insert_commit(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, token)) {
+			__sync_fetch_and_add(&nr_tx_dispatched, 1);
+			bpf_map_update_elem(&last_token, &pid, &token, BPF_ANY);
+		} else {
+			scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, 0);
+		}
+	}
+
+	bpf_task_release(p);
+}
+
+void BPF_STRUCT_OPS(dispatch_cookie_exit, struct scx_exit_info *ei)
+{
+	UEI_RECORD(uei, ei);
+}
+
+SEC(".struct_ops.link")
+struct sched_ext_ops dispatch_cookie_ops = {
+	.enqueue		= (void *) dispatch_cookie_enqueue,
+	.dispatch		= (void *) dispatch_cookie_dispatch,
+	.exit			= (void *) dispatch_cookie_exit,
+	.name			= "dispatch_cookie",
+	.timeout_ms		= 5000U,
+};
diff --git a/tools/testing/selftests/sched_ext/dispatch_cookie.c b/tools/testing/selftests/sched_ext/dispatch_cookie.c
new file mode 100644
index 000000000000..6b46ca4efeb4
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/dispatch_cookie.c
@@ -0,0 +1,131 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Test scx_bpf_dsq_insert_begin() and scx_bpf_dsq_insert_commit().
+ *
+ * Copyright (C) 2026 Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
+ * Copyright (C) 2026 Cheng-Yang Chou <yphbchou0911@gmail.com>
+ */
+#define _GNU_SOURCE
+#include <bpf/bpf.h>
+#include <errno.h>
+#include <sched.h>
+#include <scx/common.h>
+#include <signal.h>
+#include <stdlib.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include "dispatch_cookie.bpf.skel.h"
+#include "scx_test.h"
+
+#define NUM_CHILDREN	32
+#define NUM_FLIP_ITERS	100
+
+struct dispatch_cookie_ctx {
+	struct dispatch_cookie	*skel;
+	struct bpf_link		*link;
+};
+
+static enum scx_test_status setup(void **ctx)
+{
+	struct dispatch_cookie_ctx *tctx;
+
+	if (!__COMPAT_has_ksym("scx_bpf_dsq_insert_begin")) {
+		fprintf(stderr, "SKIP: dispatch transaction API not supported\n");
+		return SCX_TEST_SKIP;
+	}
+
+	tctx = malloc(sizeof(*tctx));
+	SCX_FAIL_IF(!tctx, "Failed to allocate test context");
+	tctx->link = NULL;
+
+	tctx->skel = dispatch_cookie__open();
+	if (!tctx->skel) {
+		free(tctx);
+		SCX_FAIL("Failed to open skel");
+	}
+	SCX_ENUM_INIT(tctx->skel);
+	if (dispatch_cookie__load(tctx->skel)) {
+		dispatch_cookie__destroy(tctx->skel);
+		free(tctx);
+		SCX_FAIL("Failed to load skel");
+	}
+
+	*ctx = tctx;
+
+	return SCX_TEST_PASS;
+}
+
+static enum scx_test_status run(void *ctx)
+{
+	struct dispatch_cookie_ctx *tctx = ctx;
+	cpu_set_t cpuset_one, cpuset_all;
+	pid_t pids[NUM_CHILDREN];
+	int i, j, nforked = 0, status;
+
+	tctx->link = bpf_map__attach_struct_ops(tctx->skel->maps.dispatch_cookie_ops);
+	SCX_FAIL_IF(!tctx->link, "Failed to attach scheduler");
+
+	CPU_ZERO(&cpuset_one);
+	CPU_SET(0, &cpuset_one);
+	SCX_FAIL_IF(sched_getaffinity(0, sizeof(cpuset_all), &cpuset_all),
+		    "Failed to get CPU affinity (%d)", errno);
+
+	for (i = 0; i < NUM_CHILDREN; i++) {
+		pids[i] = fork();
+		if (pids[i] == 0) {
+			while (1)
+				sched_yield();
+		}
+		if (pids[i] > 0)
+			nforked++;
+	}
+
+	/*
+	 * Flip affinity to trigger dequeue/re-enqueue, which increments qseq
+	 * and makes previously captured tokens stale.
+	 */
+	for (i = 0; i < NUM_FLIP_ITERS; i++) {
+		for (j = 0; j < NUM_CHILDREN; j++) {
+			if (pids[j] <= 0)
+				continue;
+			sched_setaffinity(pids[j], sizeof(cpuset_one),
+					  &cpuset_one);
+			sched_setaffinity(pids[j], sizeof(cpuset_all),
+					  &cpuset_all);
+		}
+		usleep(1000);
+	}
+
+	for (i = 0; i < NUM_CHILDREN; i++) {
+		if (pids[i] <= 0)
+			continue;
+		kill(pids[i], SIGKILL);
+		waitpid(pids[i], &status, 0);
+	}
+
+	SCX_GT(nforked, 0);
+	SCX_GT(tctx->skel->bss->nr_tx_dispatched, 0);
+	SCX_GT(tctx->skel->bss->nr_tx_stale, 0);
+
+	return SCX_TEST_PASS;
+}
+
+static void cleanup(void *ctx)
+{
+	struct dispatch_cookie_ctx *tctx = ctx;
+
+	if (tctx->link)
+		bpf_link__destroy(tctx->link);
+	dispatch_cookie__destroy(tctx->skel);
+	free(tctx);
+}
+
+struct scx_test dispatch_cookie = {
+	.name		= "dispatch_cookie",
+	.description	= "Verify scx_bpf_dsq_insert_begin() and "
+			  "scx_bpf_dsq_insert_commit() dispatch tasks correctly",
+	.setup		= setup,
+	.run		= run,
+	.cleanup	= cleanup,
+};
+REGISTER_SCX_TEST(&dispatch_cookie)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/2] selftests/sched_ext: Add dispatch_cookie test
  2026-05-09 19:11 ` [PATCH v3 2/2] selftests/sched_ext: Add dispatch_cookie test Cheng-Yang Chou
@ 2026-05-09 20:16   ` sashiko-bot
  2026-05-09 21:43   ` Andrea Righi
  1 sibling, 0 replies; 13+ messages in thread
From: sashiko-bot @ 2026-05-09 20:16 UTC (permalink / raw)
  To: Cheng-Yang Chou; +Cc: sched-ext

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] `scx_bpf_dsq_insert_commit()` is incorrectly called with `SCX_SLICE_DFL` as the `enq_flags` argument in the test.
--

commit 080e78551f77aa8abf00b5d2d1e0d0b5fa62c1b7
Author: Cheng-Yang Chou <yphbchou0911@gmail.com>

selftests/sched_ext: Add dispatch_cookie test

This commit adds a new BPF scheduler selftest to verify the behavior of the
scx_bpf_dsq_insert_begin() and scx_bpf_dsq_insert_commit() APIs.

> diff --git a/tools/testing/selftests/sched_ext/dispatch_cookie.bpf.c b/tools/testing/selftests/sched_ext/dispatch_cookie.bpf.c
> new file mode 100644
> index 0000000000000..ed31cddcda7b6
> --- /dev/null
> +++ b/tools/testing/selftests/sched_ext/dispatch_cookie.bpf.c
[ ... ]
> +void BPF_STRUCT_OPS(dispatch_cookie_dispatch, s32 cpu,
> +		    struct task_struct *prev)
> +{
[ ... ]
> +	stored = bpf_map_lookup_elem(&last_token, &pid);
> +	if (stored) {
> +		token = *stored;
> +		bpf_map_delete_elem(&last_token, &pid);
> +		scx_bpf_dsq_insert_commit(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, token);

Is SCX_SLICE_DFL intended to be used as the enq_flags argument here?

Looking at the function signature in kernel/sched/ext.c:

bool scx_bpf_dsq_insert_commit(struct task_struct *p, u64 dsq_id,
                               u64 enq_flags, u64 token, ...)

It appears scx_bpf_dsq_insert_commit() does not take a slice parameter
like scx_bpf_dsq_insert() does.

Since SCX_SLICE_DFL is a time slice duration, passing it into the enq_flags
parameter might inject undocumented garbage bits into the scheduler's
enqueue operation. Does this unintentionally alter the enqueue behavior?

> +		scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, 0);
> +		__sync_fetch_and_add(&nr_tx_stale, 1);
> +	} else {
> +		token = scx_bpf_dsq_insert_begin(p);
> +		if (scx_bpf_dsq_insert_commit(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, token)) {

There is another instance here where SCX_SLICE_DFL is passed as the third
argument to scx_bpf_dsq_insert_commit().

> +			__sync_fetch_and_add(&nr_tx_dispatched, 1);
> +			bpf_map_update_elem(&last_token, &pid, &token, BPF_ANY);
> +		} else {
> +			scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, 0);
> +		}
> +	}
> +
> +	bpf_task_release(p);
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260509191223.168648-1-yphbchou0911@gmail.com?part=2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/2] selftests/sched_ext: Add dispatch_cookie test
  2026-05-09 19:11 ` [PATCH v3 2/2] selftests/sched_ext: Add dispatch_cookie test Cheng-Yang Chou
  2026-05-09 20:16   ` sashiko-bot
@ 2026-05-09 21:43   ` Andrea Righi
  2026-05-15 13:42     ` Cheng-Yang Chou
  1 sibling, 1 reply; 13+ messages in thread
From: Andrea Righi @ 2026-05-09 21:43 UTC (permalink / raw)
  To: Cheng-Yang Chou
  Cc: sched-ext, Tejun Heo, David Vernet, Changwoo Min, Kuba Piecuch,
	Ching-Chun Huang, Chia-Ping Tsai

Hi Cheng-Yang,

On Sun, May 10, 2026 at 03:11:57AM +0800, Cheng-Yang Chou wrote:
> Test scx_bpf_dsq_insert_begin() and scx_bpf_dsq_insert_commit().
> 
> The BPF scheduler enqueues tasks into a BPF queue map in ops.enqueue()
> and dispatches them via the begin/commit transaction API in ops.dispatch().
> After a successful dispatch, the token is stored. On the task's next
> dispatch (after dequeue/re-enqueue increments qseq), the stored token is
> stale and finish_dispatch() silently drops the buffered entry.
> 
> Userspace forks spinning children and repeatedly flips their CPU affinity
> to trigger dequeue/re-enqueue cycles. The test verifies both nr_tx_dispatched
> and nr_tx_stale are positive, exercising both the happy path and the
> race-detection window.
> 
> Suggested-by: Tejun Heo <tj@kernel.org>
> Suggested-by: Kuba Piecuch <jpiecuch@google.com>
> Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
> ---
...
> +void BPF_STRUCT_OPS(dispatch_cookie_dispatch, s32 cpu,
> +		    struct task_struct *prev)
> +{
> +	s32 pid;
> +	struct task_struct *p;
> +	u64 *stored, token;
> +
> +	if (bpf_map_pop_elem(&queue, &pid))
> +		return;
> +
> +	p = bpf_task_from_pid(pid);
> +	if (!p)
> +		return;
> +
> +	/*
> +	 * After a successful fresh dispatch, store the token. On the task's
> +	 * next dispatch (after re-enqueue increments qseq), the stored token
> +	 * exercises the race-detection window in finish_dispatch().
> +	 *
> +	 * scx_bpf_dsq_insert_commit() always returns %true when the preamble
> +	 * passes. Stale detection fires asynchronously in finish_dispatch()
> +	 * with no BPF-observable signal. When using a stored token, always
> +	 * pair the commit() call with a fallback scx_bpf_dsq_insert(): if
> +	 * the token is stale, finish_dispatch() drops the first entry and
> +	 * the fallback dispatches the task. If the token is still fresh,
> +	 * finish_dispatch() dispatches it and the fallback's CAS is a no-op.
> +	 */
> +	stored = bpf_map_lookup_elem(&last_token, &pid);
> +	if (stored) {
> +		token = *stored;
> +		bpf_map_delete_elem(&last_token, &pid);
> +		scx_bpf_dsq_insert_commit(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, token);

I think we should pass 0 instead of SCX_SLICE_DFL here:

 scx_bpf_dsq_insert_commit(p, dsq_id, enq_flags, token)

> +		scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, 0);
> +		__sync_fetch_and_add(&nr_tx_stale, 1);
> +	} else {
> +		token = scx_bpf_dsq_insert_begin(p);
> +		if (scx_bpf_dsq_insert_commit(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, token)) {

Ditto.

> +			__sync_fetch_and_add(&nr_tx_dispatched, 1);

Also, stale detection happens asynchronously later in finish_dispatch() with no
BPF-observable signal, so commit can return true here and write the buffered
entry with stale qseq=N, but the counter is incremented before finish_dispatch()
runs. Later, when finish_dispatch() processes the entry, it detects the qseq
mismatch (task now has N+1) and silently drops the dispatch. So nr_tx_stale
counts "attempted stale commits", not "detected and rejected stale tokens".

> +			bpf_map_update_elem(&last_token, &pid, &token, BPF_ANY);
> +		} else {
> +			scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, 0);
> +		}
> +	}
> +
> +	bpf_task_release(p);
> +}
> +
> +void BPF_STRUCT_OPS(dispatch_cookie_exit, struct scx_exit_info *ei)
> +{
> +	UEI_RECORD(uei, ei);
> +}
> +
> +SEC(".struct_ops.link")
> +struct sched_ext_ops dispatch_cookie_ops = {
> +	.enqueue		= (void *) dispatch_cookie_enqueue,
> +	.dispatch		= (void *) dispatch_cookie_dispatch,
> +	.exit			= (void *) dispatch_cookie_exit,
> +	.name			= "dispatch_cookie",
> +	.timeout_ms		= 5000U,
> +};

...

> +static enum scx_test_status run(void *ctx)
> +{
> +	struct dispatch_cookie_ctx *tctx = ctx;
> +	cpu_set_t cpuset_one, cpuset_all;
> +	pid_t pids[NUM_CHILDREN];
> +	int i, j, nforked = 0, status;
> +
> +	tctx->link = bpf_map__attach_struct_ops(tctx->skel->maps.dispatch_cookie_ops);
> +	SCX_FAIL_IF(!tctx->link, "Failed to attach scheduler");
> +
> +	CPU_ZERO(&cpuset_one);
> +	CPU_SET(0, &cpuset_one);

This is probably fine as it is, but it's worth noticing that if for any reason
CPU0 is offline the test is probably failing.

> +	SCX_FAIL_IF(sched_getaffinity(0, sizeof(cpuset_all), &cpuset_all),
> +		    "Failed to get CPU affinity (%d)", errno);
> +
> +	for (i = 0; i < NUM_CHILDREN; i++) {
> +		pids[i] = fork();
> +		if (pids[i] == 0) {
> +			while (1)
> +				sched_yield();
> +		}
> +		if (pids[i] > 0)
> +			nforked++;
> +	}

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/2] selftests/sched_ext: Add dispatch_cookie test
  2026-05-09 21:43   ` Andrea Righi
@ 2026-05-15 13:42     ` Cheng-Yang Chou
  0 siblings, 0 replies; 13+ messages in thread
From: Cheng-Yang Chou @ 2026-05-15 13:42 UTC (permalink / raw)
  To: Andrea Righi
  Cc: sched-ext, Tejun Heo, David Vernet, Changwoo Min, Kuba Piecuch,
	Ching-Chun Huang, Chia-Ping Tsai

Hi Andrea,

On Sat, May 09, 2026 at 11:43:07PM +0200, Andrea Righi wrote:
> ...
> > +void BPF_STRUCT_OPS(dispatch_cookie_dispatch, s32 cpu,
> > +		    struct task_struct *prev)
> > +{
> > +	s32 pid;
> > +	struct task_struct *p;
> > +	u64 *stored, token;
> > +
> > +	if (bpf_map_pop_elem(&queue, &pid))
> > +		return;
> > +
> > +	p = bpf_task_from_pid(pid);
> > +	if (!p)
> > +		return;
> > +
> > +	/*
> > +	 * After a successful fresh dispatch, store the token. On the task's
> > +	 * next dispatch (after re-enqueue increments qseq), the stored token
> > +	 * exercises the race-detection window in finish_dispatch().
> > +	 *
> > +	 * scx_bpf_dsq_insert_commit() always returns %true when the preamble
> > +	 * passes. Stale detection fires asynchronously in finish_dispatch()
> > +	 * with no BPF-observable signal. When using a stored token, always
> > +	 * pair the commit() call with a fallback scx_bpf_dsq_insert(): if
> > +	 * the token is stale, finish_dispatch() drops the first entry and
> > +	 * the fallback dispatches the task. If the token is still fresh,
> > +	 * finish_dispatch() dispatches it and the fallback's CAS is a no-op.
> > +	 */
> > +	stored = bpf_map_lookup_elem(&last_token, &pid);
> > +	if (stored) {
> > +		token = *stored;
> > +		bpf_map_delete_elem(&last_token, &pid);
> > +		scx_bpf_dsq_insert_commit(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, token);
> 
> I think we should pass 0 instead of SCX_SLICE_DFL here:
> 
>  scx_bpf_dsq_insert_commit(p, dsq_id, enq_flags, token)
> 
> > +		scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, 0);
> > +		__sync_fetch_and_add(&nr_tx_stale, 1);
> > +	} else {
> > +		token = scx_bpf_dsq_insert_begin(p);
> > +		if (scx_bpf_dsq_insert_commit(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, token)) {
> 
> Ditto.

Ack! v4 wraps the arguments in scx_bpf_dsq_insert_commit_args with
slice=0, which preserves the task's existing slice instead of zeroing it.

> 
> > +			__sync_fetch_and_add(&nr_tx_dispatched, 1);
> 
> Also, stale detection happens asynchronously later in finish_dispatch() with no
> BPF-observable signal, so commit can return true here and write the buffered
> entry with stale qseq=N, but the counter is incremented before finish_dispatch()
> runs. Later, when finish_dispatch() processes the entry, it detects the qseq
> mismatch (task now has N+1) and silently drops the dispatch. So nr_tx_stale
> counts "attempted stale commits", not "detected and rejected stale tokens".
> 

Ack, the comment now reads "counts attempted stale commits, not
rejections detected by finish_dispatch()".

> ...
> 
> > +static enum scx_test_status run(void *ctx)
> > +{
> > +	struct dispatch_cookie_ctx *tctx = ctx;
> > +	cpu_set_t cpuset_one, cpuset_all;
> > +	pid_t pids[NUM_CHILDREN];
> > +	int i, j, nforked = 0, status;
> > +
> > +	tctx->link = bpf_map__attach_struct_ops(tctx->skel->maps.dispatch_cookie_ops);
> > +	SCX_FAIL_IF(!tctx->link, "Failed to attach scheduler");
> > +
> > +	CPU_ZERO(&cpuset_one);
> > +	CPU_SET(0, &cpuset_one);
> 
> This is probably fine as it is, but it's worth noticing that if for any reason
> CPU0 is offline the test is probably failing.

Make sense. Fixed by calling sched_getaffinity() to find the first
online CPU to avoid failures if CPU 0 is somehow offline.

Thanks!

-- 
Cheers,
Cheng-Yang

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-15 14:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-09 19:11 [PATCH v3 sched_ext/for-7.2 0/2] sched_ext: Add dispatch transaction API Cheng-Yang Chou
2026-05-09 19:11 ` [PATCH v3 1/2] " Cheng-Yang Chou
2026-05-09 19:49   ` sashiko-bot
2026-05-09 21:24   ` Andrea Righi
2026-05-10 14:02     ` Tejun Heo
2026-05-10 14:58       ` Andrea Righi
2026-05-15 14:50         ` Cheng-Yang Chou
2026-05-10 14:06   ` Tejun Heo
2026-05-15 14:36     ` Cheng-Yang Chou
2026-05-09 19:11 ` [PATCH v3 2/2] selftests/sched_ext: Add dispatch_cookie test Cheng-Yang Chou
2026-05-09 20:16   ` sashiko-bot
2026-05-09 21:43   ` Andrea Righi
2026-05-15 13:42     ` Cheng-Yang Chou

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.