Re: [PATCH v2 sched_ext/for-7.1] sched_ext: Invalidate dispatch decisions on CPU affinity changes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrea Righi <arighi@nvidia.com>
To: Cheng-Yang Chou <yphbchou0911@gmail.com>
Cc: Tejun Heo <tj@kernel.org>, Kuba Piecuch <jpiecuch@google.com>,
	David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Emil Tsalapatis <emil@etsalapatis.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Daniel Hodges <hodgesd@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
	Ching-Chun Huang <jserv@ccns.ncku.edu.tw>,
	Chia-Ping Tsai <chia7712@gmail.com>
Subject: Re: [PATCH v2 sched_ext/for-7.1] sched_ext: Invalidate dispatch decisions on CPU affinity changes
Date: Wed, 22 Apr 2026 13:02:55 +0200	[thread overview]
Message-ID: <aeiq3-TGABRgOrD_@gpd4> (raw)
In-Reply-To: <20260422142633.G7180@cchengyang.duckdns.org>

Hi Cheng-Yang,

On Wed, Apr 22, 2026 at 02:33:40PM +0800, Cheng-Yang Chou wrote:
> Hi Tejun, Andrea, and Kuba
> 
> On Mon, Mar 23, 2026 at 01:13:20PM -1000, Tejun Heo wrote:
> > > The simple way to do this is to do scx_bpf_dsq_insert() at the very beginning,
> > > once we know which task we would like to dispatch, and cancel the pending
> > > dispatch via scx_bpf_dispatch_cancel() if any of the pre-dispatch checks fail
> > > on the BPF side. This way, the "critical section" includes BPF-side checks, and
> > > SCX will ignore the dispatch if there was a dequeue/enqueue racing with the
> > > critical section.
> > > 
> > > With this solution, we can throw an error if task_can_run_on_remote_rq() is
> > > false, because we know that there was no racing cpumask change (if there was,
> > > it would have been caught earlier, in finish_dispatch()).
> > 
> > Yeah, I think this makes more sense. qseq is already there to provide
> > protection against these events. It's just that the capturing of qseq is too
> > late. If insert/cancel is too ugly, we can introduce another kfunc to
> > capture the qseq - scx_bpf_dsq_insert_begin() or something like that - and
> > stash it in a per-cpu variable. That way, qseq would be cover the "current"
> > queued instance and the existing qseq mechanism would be able to reliably
> > ignore the ones that lost race to dequeue.
> 
> Since this has been stale for a while, I prepared a patch to implement
> scx_bpf_dsq_insert_begin() as suggested.
> 
> Is anyone else working on this? If not, I'm happy to send the formal
> patch to fix this.

It's sitting in my TODO list. If you have time to work on this, go ahead and
send the updated patch. I'll take a look at it ASAP.

Thanks,
-Andrea

> 
> -- 
> Cheers,
> Cheng-Yang
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 0a53a0dd64bf..0215a21a02db 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -7933,6 +7933,7 @@ static void scx_dsq_insert_commit(struct scx_sched *sch, struct task_struct *p,
>  {
>  	struct scx_dsp_ctx *dspc = &this_cpu_ptr(sch->pcpu)->dsp_ctx;
>  	struct task_struct *ddsp_task;
> +	unsigned long qseq;
>  
>  	ddsp_task = __this_cpu_read(direct_dispatch_task);
>  	if (ddsp_task) {
> @@ -7945,9 +7946,16 @@ static void scx_dsq_insert_commit(struct scx_sched *sch, struct task_struct *p,
>  		return;
>  	}
>  
> +	if (dspc->insert_begin_valid) {
> +		qseq = dspc->insert_begin_qseq;
> +		dspc->insert_begin_valid = false;
> +	} else {
> +		qseq = atomic_long_read(&p->scx.ops_state) & SCX_OPSS_QSEQ_MASK;
> +	}
> +
>  	dspc->buf[dspc->cursor++] = (struct scx_dsp_buf_ent){
>  		.task = p,
> -		.qseq = atomic_long_read(&p->scx.ops_state) & SCX_OPSS_QSEQ_MASK,
> +		.qseq = qseq,
>  		.dsq_id = dsq_id,
>  		.enq_flags = enq_flags,
>  	};
> @@ -7955,6 +7963,39 @@ static void scx_dsq_insert_commit(struct scx_sched *sch, struct task_struct *p,
>  
>  __bpf_kfunc_start_defs();
>  
> +/**
> + * scx_bpf_dsq_insert_begin - Snapshot qseq before a dispatch decision
> + * @p: task_struct being considered for dispatch
> + * @aux: implicit BPF argument to access bpf_prog_aux hidden from BPF progs
> + *
> + * Capture @p's qseq before the BPF scheduler reads @p's properties (e.g.
> + * cpus_ptr) to make a dispatch decision. The snapshot is used by the
> + * subsequent scx_bpf_dsq_insert() call, extending the race detection window
> + * to cover any BPF-side checks between this call and the insert. If a
> + * concurrent dequeue/re-enqueue races within this window, finish_dispatch()
> + * detects the qseq mismatch and discards the stale dispatch.
> + */
> +__bpf_kfunc void scx_bpf_dsq_insert_begin(struct task_struct *p,
> +					  const struct bpf_prog_aux *aux)
> +{
> +	struct scx_sched *sch;
> +	struct scx_dsp_ctx *dspc;
> +
> +	guard(rcu)();
> +
> +	sch = scx_prog_sched(aux);
> +	if (unlikely(!sch))
> +		return;
> +
> +	if (!scx_kf_allowed(sch, SCX_KF_ENQUEUE | SCX_KF_DISPATCH))
> +		return;
> +
> +	dspc = &this_cpu_ptr(sch->pcpu)->dsp_ctx;
> +	dspc->insert_begin_qseq = atomic_long_read(&p->scx.ops_state) &
> +				  SCX_OPSS_QSEQ_MASK;
> +	dspc->insert_begin_valid = true;
> +}
> +
>  /**
>   * scx_bpf_dsq_insert - Insert a task into the FIFO queue of a DSQ
>   * @p: task_struct to insert
> @@ -8134,6 +8175,7 @@ __bpf_kfunc void scx_bpf_dsq_insert_vtime(struct task_struct *p, u64 dsq_id,
>  __bpf_kfunc_end_defs();
>  
>  BTF_KFUNCS_START(scx_kfunc_ids_enqueue_dispatch)
> +BTF_ID_FLAGS(func, scx_bpf_dsq_insert_begin, KF_IMPLICIT_ARGS | KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_dsq_insert, KF_IMPLICIT_ARGS | KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_dsq_insert___v2, KF_IMPLICIT_ARGS | KF_RCU)
>  BTF_ID_FLAGS(func, __scx_bpf_dsq_insert_vtime, KF_IMPLICIT_ARGS | KF_RCU)
> diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
> index 4a7ffc7f55d2..adc4f1c01b56 100644
> --- a/kernel/sched/ext_internal.h
> +++ b/kernel/sched/ext_internal.h
> @@ -989,6 +989,8 @@ struct scx_dsp_ctx {
>  	struct rq		*rq;
>  	u32			cursor;
>  	u32			nr_tasks;
> +	unsigned long		insert_begin_qseq;
> +	bool			insert_begin_valid;
>  	struct scx_dsp_buf_ent	buf[];
>  };
>  
> diff --git a/tools/sched_ext/scx_central.bpf.c b/tools/sched_ext/scx_central.bpf.c
> index 64dd60b3e922..fb68a7d7e201 100644
> --- a/tools/sched_ext/scx_central.bpf.c
> +++ b/tools/sched_ext/scx_central.bpf.c
> @@ -155,6 +155,8 @@ static bool dispatch_to_cpu(s32 cpu)
>  		* reflect the migration-disabled state yet if
>  		* migrate_disable_switch() hasn't run.
>  		*/
> +		scx_bpf_dsq_insert_begin(p);
> +
>  		if (!bpf_cpumask_test_cpu(cpu, p->cpus_ptr) ||
>  		   (is_migration_disabled(p) && scx_bpf_task_cpu(p) != cpu)) {
>  			__sync_fetch_and_add(&nr_mismatches, 1);
> -- 
> -- 
> 2.48.1

next prev parent reply	other threads:[~2026-04-22 11:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19  8:35 [PATCH v2 sched_ext/for-7.1] sched_ext: Invalidate dispatch decisions on CPU affinity changes Andrea Righi
2026-03-19 10:31 ` Kuba Piecuch
2026-03-19 13:54   ` Kuba Piecuch
2026-03-19 21:09   ` Andrea Righi
2026-03-20  9:18     ` Kuba Piecuch
2026-03-23 23:13       ` Tejun Heo
2026-04-22  6:33         ` Cheng-Yang Chou
2026-04-22 11:02           ` Andrea Righi [this message]
2026-04-23 13:32           ` Kuba Piecuch
2026-04-26  1:47             ` Cheng-Yang Chou
2026-04-27  9:06               ` Kuba Piecuch
2026-05-01 16:19                 ` Cheng-Yang Chou
2026-05-04  8:00                   ` Kuba Piecuch
2026-05-04 21:24                     ` Tejun Heo
2026-05-04 21:58                       ` Andrea Righi
2026-05-05  8:35                         ` Cheng-Yang Chou
2026-05-05  8:01                       ` Kuba Piecuch
2026-05-05  8:31                         ` Tejun Heo
2026-05-05  9:13                           ` Kuba Piecuch
2026-05-05 15:14                             ` Tejun Heo
2026-05-05 15:58                           ` Cheng-Yang Chou
2026-03-19 15:18 ` Kuba Piecuch
2026-03-19 19:01   ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeiq3-TGABRgOrD_@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=chia7712@gmail.com \
    --cc=christian.loehle@arm.com \
    --cc=emil@etsalapatis.com \
    --cc=hodgesd@meta.com \
    --cc=jpiecuch@google.com \
    --cc=jserv@ccns.ncku.edu.tw \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    --cc=yphbchou0911@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.