All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Cheng-Yang Chou <yphbchou0911@gmail.com>
Cc: sched-ext@lists.linux.dev, Tejun Heo <tj@kernel.org>,
	David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Kuba Piecuch <jpiecuch@google.com>,
	Ching-Chun Huang <jserv@ccns.ncku.edu.tw>,
	Chia-Ping Tsai <chia7712@gmail.com>
Subject: Re: [PATCH v3 1/2] sched_ext: Add dispatch transaction API
Date: Sat, 9 May 2026 23:24:12 +0200	[thread overview]
Message-ID: <af-l_M9p9npc9FUA@gpd4> (raw)
In-Reply-To: <20260509191223.168648-2-yphbchou0911@gmail.com>

Hi Cheng-Yang,

On Sun, May 10, 2026 at 03:11:56AM +0800, Cheng-Yang Chou wrote:
> scx_bpf_dsq_insert() captures the task's dispatch token at insert time.
> Any BPF-side validity checks performed before the insert fall outside
> the race detection window: a dequeue/re-enqueue occurring between the
> check and the insert goes undetected, and finish_dispatch() proceeds
> with stale assumptions.
> 
> Introduce two new kfuncs to extend the detection window via a dispatch
> transaction:
> 
> - scx_bpf_dsq_insert_begin(p)
>     Starts a dispatch transaction for @p and returns an opaque u64
>     token. The BPF scheduler should call this before performing
>     pre-dispatch validity checks. The token may be stored in BPF maps
>     to support cross-CPU dispatch patterns.
> 
> - scx_bpf_dsq_insert_commit(p, dsq_id, enq_flags, token)
>     Like scx_bpf_dsq_insert() with slice=0, but commits using the
>     token captured by scx_bpf_dsq_insert_begin(). If @p was dequeued
>     or claimed between begin and commit, the transaction is silently
>     discarded. Use scx_bpf_task_set_slice() to set a non-default slice.

Why not passing slice to scx_bpf_dsq_insert_commit()? Is it because of the BPF
args limitation? In that case we could introduce a struct similar to
scx_bpf_dsq_insert_vtime_args.

Speaking of vtime, we may also need a scx_bpf_dsq_insert_vtime_commit() that
accepts dsq_vtime as well, otherwise how do we use priority DSQs with this new
transaction variant?

> 
> To support explicit token passing, rename scx_dsq_insert_commit() to
> scx_dsq_insert_buf() and add a qseq parameter. All existing callers
> preserve the original behavior.
> 
> This mechanism is intended for schedulers that do not implement
> properly synchronized dequeue. A scheduler whose ops.dequeue()
> synchronizes atomically with the dispatch path does not need this API.

I'm wondering if we should validate qseq also in mark_direct_dispatch(), it
seems that we're not validating qseq in the direct dispatch path, or am I
missing something?

Thanks,
-Andrea

> 
> Suggested-by: Tejun Heo <tj@kernel.org>
> Suggested-by: Kuba Piecuch <jpiecuch@google.com>
> Suggested-by: Andrea Righi <arighi@nvidia.com>
> Reported-by: Andrea Righi <arighi@nvidia.com>
> Link: https://lore.kernel.org/r/20260203230639.1259869-1-arighi@nvidia.com/
> Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
> ---
>  kernel/sched/ext.c                       | 72 ++++++++++++++++++++++--
>  tools/sched_ext/include/scx/common.bpf.h |  2 +
>  2 files changed, 69 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index b2741b6fb046..81483520f5cc 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -8315,8 +8315,8 @@ static bool scx_dsq_insert_preamble(struct scx_sched *sch, struct task_struct *p
>  	return true;
>  }
>  
> -static void scx_dsq_insert_commit(struct scx_sched *sch, struct task_struct *p,
> -				  u64 dsq_id, u64 enq_flags)
> +static void scx_dsq_insert_buf(struct scx_sched *sch, struct task_struct *p,
> +			       u64 dsq_id, u64 enq_flags, unsigned long qseq)
>  {
>  	struct scx_dsp_ctx *dspc = &this_cpu_ptr(sch->pcpu)->dsp_ctx;
>  	struct task_struct *ddsp_task;
> @@ -8334,7 +8334,7 @@ static void scx_dsq_insert_commit(struct scx_sched *sch, struct task_struct *p,
>  
>  	dspc->buf[dspc->cursor++] = (struct scx_dsp_buf_ent){
>  		.task = p,
> -		.qseq = atomic_long_read(&p->scx.ops_state) & SCX_OPSS_QSEQ_MASK,
> +		.qseq = qseq,
>  		.dsq_id = dsq_id,
>  		.enq_flags = enq_flags,
>  	};
> @@ -8401,7 +8401,8 @@ __bpf_kfunc bool scx_bpf_dsq_insert___v2(struct task_struct *p, u64 dsq_id,
>  	else
>  		p->scx.slice = p->scx.slice ?: 1;
>  
> -	scx_dsq_insert_commit(sch, p, dsq_id, enq_flags);
> +	scx_dsq_insert_buf(sch, p, dsq_id, enq_flags,
> +			      atomic_long_read(&p->scx.ops_state) & SCX_OPSS_QSEQ_MASK);
>  
>  	return true;
>  }
> @@ -8429,7 +8430,8 @@ static bool scx_dsq_insert_vtime(struct scx_sched *sch, struct task_struct *p,
>  
>  	p->scx.dsq_vtime = vtime;
>  
> -	scx_dsq_insert_commit(sch, p, dsq_id, enq_flags | SCX_ENQ_DSQ_PRIQ);
> +	scx_dsq_insert_buf(sch, p, dsq_id, enq_flags | SCX_ENQ_DSQ_PRIQ,
> +			      atomic_long_read(&p->scx.ops_state) & SCX_OPSS_QSEQ_MASK);
>  
>  	return true;
>  }
> @@ -8518,13 +8520,72 @@ __bpf_kfunc void scx_bpf_dsq_insert_vtime(struct task_struct *p, u64 dsq_id,
>  	scx_dsq_insert_vtime(sch, p, dsq_id, slice, vtime, enq_flags);
>  }
>  
> +/**
> + * scx_bpf_dsq_insert_begin - Begin a dispatch transaction for a task
> + * @p: task_struct to dispatch
> + *
> + * Returns an opaque u64 dispatch token. Pass the token to
> + * scx_bpf_dsq_insert_commit() to insert @p into a DSQ. If @p is dequeued
> + * or claimed by another path between scx_bpf_dsq_insert_begin() and
> + * scx_bpf_dsq_insert_commit(), the commit will silently fail.
> + *
> + * This API is intended for schedulers that do not implement properly
> + * synchronized dequeue.
> + */
> +__bpf_kfunc u64 scx_bpf_dsq_insert_begin(struct task_struct *p)
> +{
> +	return atomic_long_read(&p->scx.ops_state) & SCX_OPSS_QSEQ_MASK;
> +}
> +
> +/**
> + * scx_bpf_dsq_insert_commit - Commit a dispatch transaction
> + * @p: task_struct to insert
> + * @dsq_id: DSQ to insert into
> + * @enq_flags: SCX_ENQ_*
> + * @token: token from scx_bpf_dsq_insert_begin()
> + * @aux: implicit BPF argument
> + *
> + * Like scx_bpf_dsq_insert() with slice=0, but commits a dispatch transaction
> + * begun with scx_bpf_dsq_insert_begin(). If @p was dequeued or claimed
> + * between begin and commit, the dispatch is silently discarded. Use
> + * scx_bpf_task_set_slice() to set a non-default slice.
> + *
> + * Returns %true if the entry was buffered for dispatch, %false on preamble
> + * failure (e.g. @p is not owned by this scheduler). Note: stale token
> + * detection fires asynchronously in finish_dispatch() after ops.dispatch()
> + * returns. A %true return does not guarantee the task was actually dispatched.
> + */
> +__bpf_kfunc bool scx_bpf_dsq_insert_commit(struct task_struct *p,
> +					    u64 dsq_id, u64 enq_flags,
> +					    u64 token,
> +					    const struct bpf_prog_aux *aux)
> +{
> +	struct scx_sched *sch;
> +
> +	guard(rcu)();
> +	sch = scx_prog_sched(aux);
> +	if (unlikely(!sch))
> +		return false;
> +
> +	if (!scx_dsq_insert_preamble(sch, p, dsq_id, &enq_flags))
> +		return false;
> +
> +	p->scx.slice = p->scx.slice ?: 1;
> +
> +	scx_dsq_insert_buf(sch, p, dsq_id, enq_flags, (unsigned long)token);
> +
> +	return true;
> +}
> +
>  __bpf_kfunc_end_defs();
>  
>  BTF_KFUNCS_START(scx_kfunc_ids_enqueue_dispatch)
>  BTF_ID_FLAGS(func, scx_bpf_dsq_insert, KF_IMPLICIT_ARGS | KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_dsq_insert___v2, KF_IMPLICIT_ARGS | KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_dsq_insert_commit, KF_IMPLICIT_ARGS | KF_RCU)
>  BTF_ID_FLAGS(func, __scx_bpf_dsq_insert_vtime, KF_IMPLICIT_ARGS | KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_dsq_insert_vtime, KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_dsq_insert_begin, KF_RCU)
>  BTF_KFUNCS_END(scx_kfunc_ids_enqueue_dispatch)
>  
>  static const struct btf_kfunc_id_set scx_kfunc_set_enqueue_dispatch = {
> @@ -10194,6 +10255,7 @@ BTF_ID_FLAGS(func, scx_bpf_put_cpumask, KF_RELEASE)
>  BTF_ID_FLAGS(func, scx_bpf_task_running, KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_task_cpu, KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_task_cid, KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_dsq_insert_begin, KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_cpu_rq, KF_IMPLICIT_ARGS)
>  BTF_ID_FLAGS(func, scx_bpf_locked_rq, KF_IMPLICIT_ARGS | KF_RET_NULL)
>  BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_IMPLICIT_ARGS | KF_RET_NULL | KF_RCU_PROTECTED)
> diff --git a/tools/sched_ext/include/scx/common.bpf.h b/tools/sched_ext/include/scx/common.bpf.h
> index 5f715d69cde6..fb793008e2e3 100644
> --- a/tools/sched_ext/include/scx/common.bpf.h
> +++ b/tools/sched_ext/include/scx/common.bpf.h
> @@ -63,6 +63,8 @@ s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
>  s32 __scx_bpf_select_cpu_and(struct task_struct *p, const struct cpumask *cpus_allowed,
>  			     struct scx_bpf_select_cpu_and_args *args) __ksym __weak;
>  bool __scx_bpf_dsq_insert_vtime(struct task_struct *p, struct scx_bpf_dsq_insert_vtime_args *args) __ksym __weak;
> +u64 scx_bpf_dsq_insert_begin(struct task_struct *p) __ksym __weak;
> +bool scx_bpf_dsq_insert_commit(struct task_struct *p, u64 dsq_id, u64 enq_flags, u64 token) __ksym __weak;
>  u32 scx_bpf_dispatch_nr_slots(void) __ksym;
>  void scx_bpf_dispatch_cancel(void) __ksym;
>  void scx_bpf_kick_cpu(s32 cpu, u64 flags) __ksym;
> -- 
> 2.48.1
> 

  parent reply	other threads:[~2026-05-09 21:24 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-09 19:11 [PATCH v3 sched_ext/for-7.2 0/2] sched_ext: Add dispatch transaction API Cheng-Yang Chou
2026-05-09 19:11 ` [PATCH v3 1/2] " Cheng-Yang Chou
2026-05-09 19:49   ` sashiko-bot
2026-05-09 21:24   ` Andrea Righi [this message]
2026-05-10 14:02     ` Tejun Heo
2026-05-10 14:58       ` Andrea Righi
2026-05-15 14:50         ` Cheng-Yang Chou
2026-05-10 14:06   ` Tejun Heo
2026-05-15 14:36     ` Cheng-Yang Chou
2026-05-09 19:11 ` [PATCH v3 2/2] selftests/sched_ext: Add dispatch_cookie test Cheng-Yang Chou
2026-05-09 20:16   ` sashiko-bot
2026-05-09 21:43   ` Andrea Righi
2026-05-15 13:42     ` Cheng-Yang Chou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af-l_M9p9npc9FUA@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=chia7712@gmail.com \
    --cc=jpiecuch@google.com \
    --cc=jserv@ccns.ncku.edu.tw \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    --cc=yphbchou0911@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.